Hybrid attention adaptive sampling network for human pose estimation in videos
Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial‐temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensu...
Saved in:
Published in | Computer animation and virtual worlds Vol. 35; no. 4 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Chichester
Wiley Subscription Services, Inc
01.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial‐temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensures the reliability of pose data from single‐frame estimators. To address these issues, this article proposes an efficient and effective hybrid attention adaptive sampling network. This network includes a dynamic attention module and a pose quality attention module, which comprehensively consider the dynamic information and the quality of pose data. Additionally, the network improves efficiency through compact uniform sampling and parallel mechanism of multi‐head self‐attention. Our network is compatible with various video‐based pose estimation frameworks and demonstrates greater robustness in high degree of occlusion, motion blur, and illumination changes, achieving state‐of‐the‐art performance on Sub‐JHMDB dataset.
The article introduces a hybrid attention adaptive sampling network for video‐based human pose estimation, integrating dynamic and pose quality attention modules to enhance data quality and dynamic capture. This approach outperforms traditional sampling strategies, demonstrating robust performance even under challenging conditions like occlusion, motion blur, and illumination variance, achieving state‐of‐the‐art results on Sub‐JHMDB. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1546-4261 1546-427X |
DOI: | 10.1002/cav.2244 |