Multi‐modal fusion method for human action recognition based on IALC
In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi‐modal fusion framework for HAR. In this framework, a module called improved attention long short‐term memory (IAL) is proposed, which combines the improved...
Saved in:
Published in | IET image processing Vol. 17; no. 2; pp. 388 - 400 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Wiley
01.02.2023
|
Online Access | Get full text |
Cover
Loading…
Summary: | In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi‐modal fusion framework for HAR. In this framework, a module called improved attention long short‐term memory (IAL) is proposed, which combines the improved SE‐ResNet50 (ISE‐ResNet50) with long short‐term memory (LSTM). IAL can extract the video sequence features and the skeleton sequence features of human behaviour. To improve the performance of HAR at a high semantic level, the obtained multi‐modal sequence features are fed into a couple hidden Markov model (CHMM), and a multi‐modal IAL+CHMM method called IALC is developed based on a probability graph model. To test the performance of the proposed method, experiments are conducted on the HMDB51, UCF101, Kinetics 400k, and ActivityNet datasets, and the obtained recognition accuracy are 86.40%, 97.78%, 81.12%, and 69.36% on the four datasets, respectively. The experimental results show that when the environment is complex, the proposed multi‐modal fusion method for HAR based on the IALC can achieve more accurate target recognition results. |
---|---|
ISSN: | 1751-9659 1751-9667 |
DOI: | 10.1049/ipr2.12640 |