End-to-End Multi-Person Pose Estimation with Transformers

Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical s...

Full description

Saved in:

Bibliographic Details
Published in	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11059 - 11068
Main Authors	Shi, Dahu, Wei, Xing, Li, Liangqi, Ren, Ye, Tan, Wenming
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2022
Subjects	Benchmark testing categorization Codes Computer vision Image analysis Kinematics Location awareness Pose estimation Pose estimation and tracking; Recognition: detection retrieval; Scene analysis and understanding
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing. In PETR, multiple pose queries are learned to directly reason a set of full-body poses. Then a joint decoder is utilized to further refine the poses by exploring the kinematic relations between body joints. With the attention mechanism, the proposed method is able to adaptively attend to the features most relevant to target keypoints, which largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably. Extensive experiments on the MS COCO and CrowdPose benchmarks show that PETR plays favorably against state-of-the-art approaches in terms of both accuracy and efficiency. The code and models are available at https://github.com/hikvision-research/opera.
ISSN:	2575-7075
DOI:	10.1109/CVPR52688.2022.01079