Multi‐modal fusion method for human action recognition based on IALC

In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi‐modal fusion framework for HAR. In this framework, a module called improved attention long short‐term memory (IAL) is proposed, which combines the improved...

Full description

Saved in:

Bibliographic Details
Published in	IET image processing Vol. 17; no. 2; pp. 388 - 400
Main Authors	Zhang, Yinhuan, Xiao, Qinkun, Liu, Xing, Wei, Yongquan, Chu, Chaoqin, Xue, Jingyun
Format	Journal Article
Language	English
Published	Wiley 01.02.2023
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi‐modal fusion framework for HAR. In this framework, a module called improved attention long short‐term memory (IAL) is proposed, which combines the improved SE‐ResNet50 (ISE‐ResNet50) with long short‐term memory (LSTM). IAL can extract the video sequence features and the skeleton sequence features of human behaviour. To improve the performance of HAR at a high semantic level, the obtained multi‐modal sequence features are fed into a couple hidden Markov model (CHMM), and a multi‐modal IAL+CHMM method called IALC is developed based on a probability graph model. To test the performance of the proposed method, experiments are conducted on the HMDB51, UCF101, Kinetics 400k, and ActivityNet datasets, and the obtained recognition accuracy are 86.40%, 97.78%, 81.12%, and 69.36% on the four datasets, respectively. The experimental results show that when the environment is complex, the proposed multi‐modal fusion method for HAR based on the IALC can achieve more accurate target recognition results.
ISSN:	1751-9659 1751-9667
DOI:	10.1049/ipr2.12640