Hybrid attention adaptive sampling network for human pose estimation in videos

Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial‐temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensu...

Full description

Saved in:

Bibliographic Details
Published in	Computer animation and virtual worlds Vol. 35; no. 4
Main Authors	Song, Qianyun, Zhang, Hao, Liu, Yanan, Sun, Shouzheng, Xu, Dan
Format	Journal Article
Language	English
Published	Chichester Wiley Subscription Services, Inc 01.07.2024
Subjects	Adaptive sampling attention mechanism Blurring human pose estimation Modules Occlusion Pose estimation Video videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial‐temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensures the reliability of pose data from single‐frame estimators. To address these issues, this article proposes an efficient and effective hybrid attention adaptive sampling network. This network includes a dynamic attention module and a pose quality attention module, which comprehensively consider the dynamic information and the quality of pose data. Additionally, the network improves efficiency through compact uniform sampling and parallel mechanism of multi‐head self‐attention. Our network is compatible with various video‐based pose estimation frameworks and demonstrates greater robustness in high degree of occlusion, motion blur, and illumination changes, achieving state‐of‐the‐art performance on Sub‐JHMDB dataset. The article introduces a hybrid attention adaptive sampling network for video‐based human pose estimation, integrating dynamic and pose quality attention modules to enhance data quality and dynamic capture. This approach outperforms traditional sampling strategies, demonstrating robust performance even under challenging conditions like occlusion, motion blur, and illumination variance, achieving state‐of‐the‐art results on Sub‐JHMDB.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1546-4261 1546-427X
DOI:	10.1002/cav.2244