Cross-attention-based hybrid ViT-CNN fusion network for action recognition in visible and infrared videos
Human action recognition (HAR) in videos is a critical task in computer vision, but traditional methods relying solely on visible (RGB) data face challenges in low-light or occluded scenarios. Infrared (IR) imagery offers robustness in such conditions, yet effectively fusing IR and visible modalitie...
Saved in:
Published in | Pattern analysis and applications : PAA Vol. 28; no. 3 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Heidelberg
Springer Nature B.V
01.09.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Human action recognition (HAR) in videos is a critical task in computer vision, but traditional methods relying solely on visible (RGB) data face challenges in low-light or occluded scenarios. Infrared (IR) imagery offers robustness in such conditions, yet effectively fusing IR and visible modalities remains an open problem. To address this, we propose HVCCA-Net, Hybrid ViT-CNN Cross-Attention Network that integrates the strengths of both modalities. Our framework consists of three key modules: (1) a video pre-processing (VPP) module that extracts IR/visible frames, stacked dense flow, and residual images; (2) an intra-modality spatio-temporal feature learning (ISTFL) module combining Inflated 3D CNN (I3D), Group Propagation Vision Transformer (GPViT), and Bi-directional Long Short-Term Memory (BiLSTM) to capture local and global features; and (3) a cross-modality multi-head attention fusion (CMHAF) module that dynamically aligns and fuses complementary features. Experiments on the Infrared-Visible dataset demonstrate state-of-the-art performance (96.0% accuracy), outperforming existing methods. The results highlight the effectiveness of our cross-attention mechanism in leveraging multimodal data for robust action recognition. The code and datasets of the proposed method are available at https://github.com/jvdgit/IR-Vis-Action-Recognition.git |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1433-7541 1433-755X |
DOI: | 10.1007/s10044-025-01493-y |