CHAN: Skeleton based action recognition by multi‐level feature learning

Skeleton‐based action recognition has been continuously and intensively studied. However, dynamic 3D skeleton data are difficult to be popularized in practical applications due to the restricted data acquisition conditions. Although the action recognition method based on 2D pose information extracte...

Full description

Saved in:

Bibliographic Details
Published in	Computer animation and virtual worlds Vol. 34; no. 6
Main Authors	Lu, Jian, Gong, Yinghao, Zhou, Yanran, Ma, Chengxian, Huang, Tingting
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 01.11.2023 Wiley Subscription Services, Inc
Subjects	action recognition Activity recognition Algorithms attention mechanism Data acquisition Feature extraction Machine learning Mathematical analysis multi‐level features Pose estimation skeleton joint coordinates
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Skeleton‐based action recognition has been continuously and intensively studied. However, dynamic 3D skeleton data are difficult to be popularized in practical applications due to the restricted data acquisition conditions. Although the action recognition method based on 2D pose information extracted from RGB video can effectively avoid the influence of complex background, it is susceptible to factors such as video jitter and joint overlap. To reduce the interference of the aforementioned factors, we use two‐dimensional skeletal joint coordinate modal information to represent the changes in human body posture. First, we use a target detector and pose estimation algorithm to obtain the joint coordinates of each frame sample from RGB video. Then the feature extraction network is combined to perform multi‐level feature learning to establish correspondence between actions and corresponding multi‐level features. Finally, the hierarchical attention mechanism is introduced to design the model named CHAN. By calculating the association between elements, the weight of the action classification is redistributed. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method. Action recognition based on skeletons has been extensively studied. However, due to limitations in data collection conditions, it is challenging to generalize dynamic 3D skeletal data in practical applications. This paper focuses on the modal information of two‐dimensional skeletal joint coordinates, which represents changes in human posture, and proposes a hierarchical attention model called CHAN. Firstly, we use object detectors and pose estimation algorithms to obtain the joint coordinates of each frame from RGB videos. Then, a feature extraction network based on locally connected CNNs is employed for multi‐level feature learning, establishing the correspondence between actions and their corresponding multi‐level features. Subsequently, a hierarchical attention network (HAN) is introduced to explore the importance of detailed information for each joint. By calculating the correlations among elements, the weights for action classification are re‐allocated. Finally, a classifier is used for the output classification. This approach combines multi‐level feature combinations and attention mechanisms to not only capture long sequential information of skeletal joints but also enable the model to learn co‐occurrence features more effectively and represent motion information more accurately.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1546-4261 1546-427X
DOI:	10.1002/cav.2193