Fusing Channel State Information and Computer Vision in an Agentic AI Workflow for Accurate Human Activity Recognition and Crisis Prevention

Publication Date

3-3-2026

Document Type

Conference Proceeding

Publication Title

Iccids 2026 9th International Conference on Computational Intelligence in Data Science

DOI

10.1109/ICCIDS69108.2026.11407639

Abstract

This paper presents a novel multimodal framework for Human Activity Recognition (HAR) in homes that combines Channel State Information (CSI) from WiFi access points with visual data from camera images. This approach introduces a dual-stream classification pipeline that leverages a Transformer-based deep learning model for processing CSI features and a computer vision model based on a fine-tuned Convolutional Neural Network (CNN) for image-based activity classification. To enable autonomous decision-making, an agentic AI module based on the Falcon Large Language Model (LLM) has been incorporated, which fuses predictions from both modalities through a reasoning-driven mechanism. This integrated system is capable of recognizing activities such as walking, sitting, standing, and falling, and is particularly effective in scenarios requiring real-time monitoring and emergency detection. Agentic AI-based fusion is particularly important because false negatives can be risky. Experimental results demonstrate that the fusion of RF and visual cues significantly improves recognition accuracy compared to unimodal approaches, offering a robust solution for intelligent, non-intrusive HAR systems.

Keywords

Agentic AI, Channel State Information, Computer Vision Models, Deep Learning, Fall Detection, Human Activity Recognition, Multimodal Fusion

Department

Applied Data Science

Share

COinS