CHAN: Skeleton based action recognition by multi‐level feature learning
Skeleton‐based action recognition has been continuously and intensively studied. However, dynamic 3D skeleton data are difficult to be popularized in practical applications due to the restricted data acquisition conditions. Although the action recognition method based on 2D pose information extracte...
Saved in:
Published in | Computer animation and virtual worlds Vol. 34; no. 6 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Hoboken, USA
John Wiley & Sons, Inc
01.11.2023
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Skeleton‐based action recognition has been continuously and intensively studied. However, dynamic 3D skeleton data are difficult to be popularized in practical applications due to the restricted data acquisition conditions. Although the action recognition method based on 2D pose information extracted from RGB video can effectively avoid the influence of complex background, it is susceptible to factors such as video jitter and joint overlap. To reduce the interference of the aforementioned factors, we use two‐dimensional skeletal joint coordinate modal information to represent the changes in human body posture. First, we use a target detector and pose estimation algorithm to obtain the joint coordinates of each frame sample from RGB video. Then the feature extraction network is combined to perform multi‐level feature learning to establish correspondence between actions and corresponding multi‐level features. Finally, the hierarchical attention mechanism is introduced to design the model named CHAN. By calculating the association between elements, the weight of the action classification is redistributed. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method.
Action recognition based on skeletons has been extensively studied. However, due to limitations in data collection conditions, it is challenging to generalize dynamic 3D skeletal data in practical applications. This paper focuses on the modal information of two‐dimensional skeletal joint coordinates, which represents changes in human posture, and proposes a hierarchical attention model called CHAN. Firstly, we use object detectors and pose estimation algorithms to obtain the joint coordinates of each frame from RGB videos. Then, a feature extraction network based on locally connected CNNs is employed for multi‐level feature learning, establishing the correspondence between actions and their corresponding multi‐level features. Subsequently, a hierarchical attention network (HAN) is introduced to explore the importance of detailed information for each joint. By calculating the correlations among elements, the weights for action classification are re‐allocated. Finally, a classifier is used for the output classification. This approach combines multi‐level feature combinations and attention mechanisms to not only capture long sequential information of skeletal joints but also enable the model to learn co‐occurrence features more effectively and represent motion information more accurately. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1546-4261 1546-427X |
DOI: | 10.1002/cav.2193 |