End-to-End Multi-Person Pose Estimation with Transformers

Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical s...

Full description

Saved in:

Bibliographic Details
Published in	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11059 - 11068
Main Authors	Shi, Dahu, Wei, Xing, Li, Liangqi, Ren, Ye, Tan, Wenming
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2022
Subjects	Benchmark testing categorization Codes Computer vision Image analysis Kinematics Location awareness Pose estimation Pose estimation and tracking; Recognition: detection retrieval; Scene analysis and understanding
Online Access	Get full text

Cover

Loading…

Abstract	Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing. In PETR, multiple pose queries are learned to directly reason a set of full-body poses. Then a joint decoder is utilized to further refine the poses by exploring the kinematic relations between body joints. With the attention mechanism, the proposed method is able to adaptively attend to the features most relevant to target keypoints, which largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably. Extensive experiments on the MS COCO and CrowdPose benchmarks show that PETR plays favorably against state-of-the-art approaches in terms of both accuracy and efficiency. The code and models are available at https://github.com/hikvision-research/opera.
AbstractList	Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing. In PETR, multiple pose queries are learned to directly reason a set of full-body poses. Then a joint decoder is utilized to further refine the poses by exploring the kinematic relations between body joints. With the attention mechanism, the proposed method is able to adaptively attend to the features most relevant to target keypoints, which largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably. Extensive experiments on the MS COCO and CrowdPose benchmarks show that PETR plays favorably against state-of-the-art approaches in terms of both accuracy and efficiency. The code and models are available at https://github.com/hikvision-research/opera.
Author	Ren, Ye Wei, Xing Tan, Wenming Li, Liangqi Shi, Dahu
Author_xml	– sequence: 1 givenname: Dahu surname: Shi fullname: Shi, Dahu email: shidahu@hikvision.com organization: Hikvision Research Institute,Hangzhou,China – sequence: 2 givenname: Xing surname: Wei fullname: Wei, Xing email: weixing@mail.xjtu.edu.cn organization: School of Software Engineering, Xi'an Jiaotong University – sequence: 3 givenname: Liangqi surname: Li fullname: Li, Liangqi email: liliangqi@hikvision.com organization: Hikvision Research Institute,Hangzhou,China – sequence: 4 givenname: Ye surname: Ren fullname: Ren, Ye email: renye@hikvision.com organization: Hikvision Research Institute,Hangzhou,China – sequence: 5 givenname: Wenming surname: Tan fullname: Tan, Wenming email: tanwenming@hikvision.com organization: Hikvision Research Institute,Hangzhou,China
BookMark	eNotjttKAzEURaMo2NZ-gT7MD2Q8uZwk51HKWIWKg1Rfy1wyGGhnZBIR_96APi027L1ZS3YxTqNn7FZAKQTQ3ea9fkVpnCslSFmCAEtnbCmMQW1IG3XOFhItcgsWr9g6xtACSgCryC0YVWPP08QziuevYwq89nOcxqKeoi-qmMKpSSHn75A-iv3cjHGY5lPuXLPLoTlGv_7nir09VPvNI9-9bJ829zseJKjEhdEDeRQkLDihWm3QdEZ4TZ3Pxtj1aIeOkBza1kvqhxaMg67XQrs8VSt28_cbvPeHzzkLzT8HctYZBeoXtrRIdQ
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR52688.2022.01079
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1665469463 9781665469463
EISSN	2575-7075
EndPage	11068
ExternalDocumentID	9878630
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 2020AAA0105600 funderid: 10.13039/501100001809
GroupedDBID	6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i203t-164f9e519170813b4656c61e49ce2025cd57fc959857be29dfb0680cd41484f93
IEDL.DBID	RIE
IngestDate	Wed Jun 26 19:28:27 EDT 2024
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-164f9e519170813b4656c61e49ce2025cd57fc959857be29dfb0680cd41484f93
PageCount	10
ParticipantIDs	ieee_primary_9878630
PublicationCentury	2000
PublicationDate	2022-June
PublicationDateYYYYMMDD	2022-06-01
PublicationDate_xml	– month: 06 year: 2022 text: 2022-June
PublicationDecade	2020
PublicationTitle	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev	CVPR
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib052007398 ssib042469789
Score	2.412112
Snippet	Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first...
SourceID	ieee
SourceType	Publisher
StartPage	11059
SubjectTerms	Benchmark testing categorization Codes Computer vision Image analysis Kinematics Location awareness Pose estimation Pose estimation and tracking; Recognition: detection retrieval; Scene analysis and understanding
Title	End-to-End Multi-Person Pose Estimation with Transformers
URI	https://ieeexplore.ieee.org/document/9878630
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MnTypbOJvcvBoujZN0uY8NoYwKbLJbmP5URChFe0u_vW-pN0m4sFTS2khr2l570u-73sA91gkm5htJGKT0ptqm5JqITSVmtlUZ7E1wuud509ytuSPK7HqwcNeC-OcC-QzF_nTsJdva7P1S2UjxMe5TBGgH-Uxa7Vau2-HM8R5P5zTvZtQlqq8U8slsRqNX4pnb27iCV2MRQhEAn_r0FMlpJTpCcx3g2mZJG_RttGR-frl0_jf0Z7C8CDeI8U-LZ1Bz1UDUJPK0qameCBBc0uLUGqTov50ZIL_eSthJH5dlix21SzeM4TldLIYz2jXNYG-sjhtKOKfUjnhcRim-1R7QzQjE8eVcRi4MFZkpVFC5SLTjilbat9_w1iOyAgfTc-hX9WVuwAijFSlTEzCN1g2Ca5NzrTDCeU6x8LLXsLAh71-b40x1l3EV39fvoZj_-JbntUN9JuPrbvFjN7ouzCV37rOnds
link.rule.ids	310,311,786,790,795,796,802,23958,23959,25170,27958,55109
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5KPehJpRXf5uDRTdPN7iZ7Li1V2xKkld5K9xEQISmaXvz1zm76EPHgKSHswg6zy8y3me8bgHtMknVElwKxSe5EtXVOFOeKCEVNrJLIaO74zuOJGM7Y05zPG_Cw48JYa33xmQ3dq_-Xb0q9dldlHcTHqYgRoB9gnI9kzdba7h5GEen90E53ekJJLNMNXw7Hd3qv2YuTN3ElXZSGCEV8Bde-q4oPKoNjGG-XU9eSvIfrSoX665dS43_XewLtPX0vyHaB6RQatmiB7BeGVCXBR-BZtyTzyXaQlZ826ONJr0mMgbuZDabbfBbHtGE26E97Q7Lpm0DeaBRXBBFQLi13SAwDfqycJJoWXcuktmg414YnuZZcpjxRlkqTK9eBQxuG2AinxmfQLMrCnkPAtZC56OouW2LixJnSKVUWXcpUiqmXuYCWM3uxqqUxFhuLL__-fAeHw-l4tBg9Tp6v4Mg5oa66uoZm9bG2NxjfK3Xr3foNRfShMQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2022+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=End-to-End+Multi-Person+Pose+Estimation+with+Transformers&rft.au=Shi%2C+Dahu&rft.au=Wei%2C+Xing&rft.au=Li%2C+Liangqi&rft.au=Ren%2C+Ye&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=2575-7075&rft.spage=11059&rft.epage=11068&rft_id=info:doi/10.1109%2FCVPR52688.2022.01079&rft.externalDocID=9878630