Hybrid attention adaptive sampling network for human pose estimation in videos

Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial‐temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensu...

Full description

Saved in:
Bibliographic Details
Published inComputer animation and virtual worlds Vol. 35; no. 4
Main Authors Song, Qianyun, Zhang, Hao, Liu, Yanan, Sun, Shouzheng, Xu, Dan
Format Journal Article
LanguageEnglish
Published Chichester Wiley Subscription Services, Inc 01.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial‐temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensures the reliability of pose data from single‐frame estimators. To address these issues, this article proposes an efficient and effective hybrid attention adaptive sampling network. This network includes a dynamic attention module and a pose quality attention module, which comprehensively consider the dynamic information and the quality of pose data. Additionally, the network improves efficiency through compact uniform sampling and parallel mechanism of multi‐head self‐attention. Our network is compatible with various video‐based pose estimation frameworks and demonstrates greater robustness in high degree of occlusion, motion blur, and illumination changes, achieving state‐of‐the‐art performance on Sub‐JHMDB dataset. The article introduces a hybrid attention adaptive sampling network for video‐based human pose estimation, integrating dynamic and pose quality attention modules to enhance data quality and dynamic capture. This approach outperforms traditional sampling strategies, demonstrating robust performance even under challenging conditions like occlusion, motion blur, and illumination variance, achieving state‐of‐the‐art results on Sub‐JHMDB.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1546-4261
1546-427X
DOI:10.1002/cav.2244