Attention-based network for effective action recognition from multi-view video

A human action recognition system is affected by many challenges such as background clutter, partial occlusion, lighting, viewpoint, execution rate. Using complementary information from different views can improve view changing and occlusion problems. However, how to effectively integrate the inform...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 192; pp. 971 - 980
Main Authors Nguyen, Hoang-Thuyen, Nguyen, Thi-Oanh
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2021
Subjects
Online AccessGet full text
ISSN1877-0509
1877-0509
DOI10.1016/j.procs.2021.08.100

Cover

Loading…
More Information
Summary:A human action recognition system is affected by many challenges such as background clutter, partial occlusion, lighting, viewpoint, execution rate. Using complementary information from different views can improve view changing and occlusion problems. However, how to effectively integrate the information from multi-view images? In this paper, we propose an effective approach for multi-view human action recognition. The proposition is based on attention mechanism to pass discriminate feature between views. It is designed to form a multi-branch network whose each branch takes responsibility for extracting a view-specific feature. Furthermore, we built a cross-view attention module to enhance action recognition by transferring knowledge between views (branches). Experiments on three datasets show that the proposed solution works effectively in different scenarios. Our models have achieved the best results on two datasets (NUMA and MicaHandGesture) for both cross-subject and cross-view evaluations. On the NUMA dataset, the accuracy of our best models reach to 99.56% and 92.74% in cross-subject and cross-view evaluation scenarios respectively. And on the MicaHandGesture dataset, the accuracy are 99.06%, 91.71% in two scenarios respectively. The obtained results surpass other previous works such as Multi-Branch TSN with GRU [5] (93.81% in cross-subject evaluation, 84.4% in cross-view evaluation on the NUMA) and DA-Net [31] (92.1% for cross-subject evaluation (video-level), and 84.2% for cross-view evaluation on the NUMA dataset). We also obtained very promising results on a large-scale NTU RGB+D dataset.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2021.08.100