End-to-End Multi-Person Pose Estimation with Transformers
Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical s...
Saved in:
Published in | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11059 - 11068 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing. In PETR, multiple pose queries are learned to directly reason a set of full-body poses. Then a joint decoder is utilized to further refine the poses by exploring the kinematic relations between body joints. With the attention mechanism, the proposed method is able to adaptively attend to the features most relevant to target keypoints, which largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably. Extensive experiments on the MS COCO and CrowdPose benchmarks show that PETR plays favorably against state-of-the-art approaches in terms of both accuracy and efficiency. The code and models are available at https://github.com/hikvision-research/opera. |
---|---|
AbstractList | Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing. In PETR, multiple pose queries are learned to directly reason a set of full-body poses. Then a joint decoder is utilized to further refine the poses by exploring the kinematic relations between body joints. With the attention mechanism, the proposed method is able to adaptively attend to the features most relevant to target keypoints, which largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably. Extensive experiments on the MS COCO and CrowdPose benchmarks show that PETR plays favorably against state-of-the-art approaches in terms of both accuracy and efficiency. The code and models are available at https://github.com/hikvision-research/opera. |
Author | Ren, Ye Wei, Xing Tan, Wenming Li, Liangqi Shi, Dahu |
Author_xml | – sequence: 1 givenname: Dahu surname: Shi fullname: Shi, Dahu email: shidahu@hikvision.com organization: Hikvision Research Institute,Hangzhou,China – sequence: 2 givenname: Xing surname: Wei fullname: Wei, Xing email: weixing@mail.xjtu.edu.cn organization: School of Software Engineering, Xi'an Jiaotong University – sequence: 3 givenname: Liangqi surname: Li fullname: Li, Liangqi email: liliangqi@hikvision.com organization: Hikvision Research Institute,Hangzhou,China – sequence: 4 givenname: Ye surname: Ren fullname: Ren, Ye email: renye@hikvision.com organization: Hikvision Research Institute,Hangzhou,China – sequence: 5 givenname: Wenming surname: Tan fullname: Tan, Wenming email: tanwenming@hikvision.com organization: Hikvision Research Institute,Hangzhou,China |
BookMark | eNotjttKAzEURaMo2NZ-gT7MD2Q8uZwk51HKWIWKg1Rfy1wyGGhnZBIR_96APi027L1ZS3YxTqNn7FZAKQTQ3ea9fkVpnCslSFmCAEtnbCmMQW1IG3XOFhItcgsWr9g6xtACSgCryC0YVWPP08QziuevYwq89nOcxqKeoi-qmMKpSSHn75A-iv3cjHGY5lPuXLPLoTlGv_7nir09VPvNI9-9bJ829zseJKjEhdEDeRQkLDihWm3QdEZ4TZ3Pxtj1aIeOkBza1kvqhxaMg67XQrs8VSt28_cbvPeHzzkLzT8HctYZBeoXtrRIdQ |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR52688.2022.01079 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1665469463 9781665469463 |
EISSN | 2575-7075 |
EndPage | 11068 |
ExternalDocumentID | 9878630 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 2020AAA0105600 funderid: 10.13039/501100001809 |
GroupedDBID | 6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i203t-164f9e519170813b4656c61e49ce2025cd57fc959857be29dfb0680cd41484f93 |
IEDL.DBID | RIE |
IngestDate | Wed Jun 26 19:28:27 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-164f9e519170813b4656c61e49ce2025cd57fc959857be29dfb0680cd41484f93 |
PageCount | 10 |
ParticipantIDs | ieee_primary_9878630 |
PublicationCentury | 2000 |
PublicationDate | 2022-June |
PublicationDateYYYYMMDD | 2022-06-01 |
PublicationDate_xml | – month: 06 year: 2022 text: 2022-June |
PublicationDecade | 2020 |
PublicationTitle | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2022 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssib052007398 ssib042469789 |
Score | 2.412112 |
Snippet | Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 11059 |
SubjectTerms | Benchmark testing categorization Codes Computer vision Image analysis Kinematics Location awareness Pose estimation Pose estimation and tracking; Recognition: detection retrieval; Scene analysis and understanding |
Title | End-to-End Multi-Person Pose Estimation with Transformers |
URI | https://ieeexplore.ieee.org/document/9878630 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MnTypbOJvcvBoujZN0uY8NoYwKbLJbmP5URChFe0u_vW-pN0m4sFTS2khr2l570u-73sA91gkm5htJGKT0ptqm5JqITSVmtlUZ7E1wuud509ytuSPK7HqwcNeC-OcC-QzF_nTsJdva7P1S2UjxMe5TBGgH-Uxa7Vau2-HM8R5P5zTvZtQlqq8U8slsRqNX4pnb27iCV2MRQhEAn_r0FMlpJTpCcx3g2mZJG_RttGR-frl0_jf0Z7C8CDeI8U-LZ1Bz1UDUJPK0qameCBBc0uLUGqTov50ZIL_eSthJH5dlix21SzeM4TldLIYz2jXNYG-sjhtKOKfUjnhcRim-1R7QzQjE8eVcRi4MFZkpVFC5SLTjilbat9_w1iOyAgfTc-hX9WVuwAijFSlTEzCN1g2Ca5NzrTDCeU6x8LLXsLAh71-b40x1l3EV39fvoZj_-JbntUN9JuPrbvFjN7ouzCV37rOnds |
link.rule.ids | 310,311,786,790,795,796,802,23958,23959,25170,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5KPehJpRXf5uDRTdPN7iZ7Li1V2xKkld5K9xEQISmaXvz1zm76EPHgKSHswg6zy8y3me8bgHtMknVElwKxSe5EtXVOFOeKCEVNrJLIaO74zuOJGM7Y05zPG_Cw48JYa33xmQ3dq_-Xb0q9dldlHcTHqYgRoB9gnI9kzdba7h5GEen90E53ekJJLNMNXw7Hd3qv2YuTN3ElXZSGCEV8Bde-q4oPKoNjGG-XU9eSvIfrSoX665dS43_XewLtPX0vyHaB6RQatmiB7BeGVCXBR-BZtyTzyXaQlZ826ONJr0mMgbuZDabbfBbHtGE26E97Q7Lpm0DeaBRXBBFQLi13SAwDfqycJJoWXcuktmg414YnuZZcpjxRlkqTK9eBQxuG2AinxmfQLMrCnkPAtZC56OouW2LixJnSKVUWXcpUiqmXuYCWM3uxqqUxFhuLL__-fAeHw-l4tBg9Tp6v4Mg5oa66uoZm9bG2NxjfK3Xr3foNRfShMQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2022+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=End-to-End+Multi-Person+Pose+Estimation+with+Transformers&rft.au=Shi%2C+Dahu&rft.au=Wei%2C+Xing&rft.au=Li%2C+Liangqi&rft.au=Ren%2C+Ye&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=2575-7075&rft.spage=11059&rft.epage=11068&rft_id=info:doi/10.1109%2FCVPR52688.2022.01079&rft.externalDocID=9878630 |