End-to-End Multi-Person Pose Estimation with Transformers

Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical s...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11059 - 11068
Main Authors Shi, Dahu, Wei, Xing, Li, Liangqi, Ren, Ye, Tan, Wenming
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing. In PETR, multiple pose queries are learned to directly reason a set of full-body poses. Then a joint decoder is utilized to further refine the poses by exploring the kinematic relations between body joints. With the attention mechanism, the proposed method is able to adaptively attend to the features most relevant to target keypoints, which largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably. Extensive experiments on the MS COCO and CrowdPose benchmarks show that PETR plays favorably against state-of-the-art approaches in terms of both accuracy and efficiency. The code and models are available at https://github.com/hikvision-research/opera.
AbstractList Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. Our method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing. In PETR, multiple pose queries are learned to directly reason a set of full-body poses. Then a joint decoder is utilized to further refine the poses by exploring the kinematic relations between body joints. With the attention mechanism, the proposed method is able to adaptively attend to the features most relevant to target keypoints, which largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably. Extensive experiments on the MS COCO and CrowdPose benchmarks show that PETR plays favorably against state-of-the-art approaches in terms of both accuracy and efficiency. The code and models are available at https://github.com/hikvision-research/opera.
Author Ren, Ye
Wei, Xing
Tan, Wenming
Li, Liangqi
Shi, Dahu
Author_xml – sequence: 1
  givenname: Dahu
  surname: Shi
  fullname: Shi, Dahu
  email: shidahu@hikvision.com
  organization: Hikvision Research Institute,Hangzhou,China
– sequence: 2
  givenname: Xing
  surname: Wei
  fullname: Wei, Xing
  email: weixing@mail.xjtu.edu.cn
  organization: School of Software Engineering, Xi'an Jiaotong University
– sequence: 3
  givenname: Liangqi
  surname: Li
  fullname: Li, Liangqi
  email: liliangqi@hikvision.com
  organization: Hikvision Research Institute,Hangzhou,China
– sequence: 4
  givenname: Ye
  surname: Ren
  fullname: Ren, Ye
  email: renye@hikvision.com
  organization: Hikvision Research Institute,Hangzhou,China
– sequence: 5
  givenname: Wenming
  surname: Tan
  fullname: Tan, Wenming
  email: tanwenming@hikvision.com
  organization: Hikvision Research Institute,Hangzhou,China
BookMark eNotjttKAzEURaMo2NZ-gT7MD2Q8uZwk51HKWIWKg1Rfy1wyGGhnZBIR_96APi027L1ZS3YxTqNn7FZAKQTQ3ea9fkVpnCslSFmCAEtnbCmMQW1IG3XOFhItcgsWr9g6xtACSgCryC0YVWPP08QziuevYwq89nOcxqKeoi-qmMKpSSHn75A-iv3cjHGY5lPuXLPLoTlGv_7nir09VPvNI9-9bJ829zseJKjEhdEDeRQkLDihWm3QdEZ4TZ3Pxtj1aIeOkBza1kvqhxaMg67XQrs8VSt28_cbvPeHzzkLzT8HctYZBeoXtrRIdQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52688.2022.01079
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665469463
9781665469463
EISSN 2575-7075
EndPage 11068
ExternalDocumentID 9878630
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 2020AAA0105600
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IH
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-164f9e519170813b4656c61e49ce2025cd57fc959857be29dfb0680cd41484f93
IEDL.DBID RIE
IngestDate Wed Jun 26 19:28:27 EDT 2024
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-164f9e519170813b4656c61e49ce2025cd57fc959857be29dfb0680cd41484f93
PageCount 10
ParticipantIDs ieee_primary_9878630
PublicationCentury 2000
PublicationDate 2022-June
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-June
PublicationDecade 2020
PublicationTitle 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev CVPR
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib052007398
ssib042469789
Score 2.412112
Snippet Current methods of multi-person pose estimation typically treat the localization and association of body joints separately. In this paper, we propose the first...
SourceID ieee
SourceType Publisher
StartPage 11059
SubjectTerms Benchmark testing
categorization
Codes
Computer vision
Image analysis
Kinematics
Location awareness
Pose estimation
Pose estimation and tracking; Recognition: detection
retrieval; Scene analysis and understanding
Title End-to-End Multi-Person Pose Estimation with Transformers
URI https://ieeexplore.ieee.org/document/9878630
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MnTypbOJvcvBoujZN0uY8NoYwKbLJbmP5URChFe0u_vW-pN0m4sFTS2khr2l570u-73sA91gkm5htJGKT0ptqm5JqITSVmtlUZ7E1wuud509ytuSPK7HqwcNeC-OcC-QzF_nTsJdva7P1S2UjxMe5TBGgH-Uxa7Vau2-HM8R5P5zTvZtQlqq8U8slsRqNX4pnb27iCV2MRQhEAn_r0FMlpJTpCcx3g2mZJG_RttGR-frl0_jf0Z7C8CDeI8U-LZ1Bz1UDUJPK0qameCBBc0uLUGqTov50ZIL_eSthJH5dlix21SzeM4TldLIYz2jXNYG-sjhtKOKfUjnhcRim-1R7QzQjE8eVcRi4MFZkpVFC5SLTjilbat9_w1iOyAgfTc-hX9WVuwAijFSlTEzCN1g2Ca5NzrTDCeU6x8LLXsLAh71-b40x1l3EV39fvoZj_-JbntUN9JuPrbvFjN7ouzCV37rOnds
link.rule.ids 310,311,786,790,795,796,802,23958,23959,25170,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5KPehJpRXf5uDRTdPN7iZ7Li1V2xKkld5K9xEQISmaXvz1zm76EPHgKSHswg6zy8y3me8bgHtMknVElwKxSe5EtXVOFOeKCEVNrJLIaO74zuOJGM7Y05zPG_Cw48JYa33xmQ3dq_-Xb0q9dldlHcTHqYgRoB9gnI9kzdba7h5GEen90E53ekJJLNMNXw7Hd3qv2YuTN3ElXZSGCEV8Bde-q4oPKoNjGG-XU9eSvIfrSoX665dS43_XewLtPX0vyHaB6RQatmiB7BeGVCXBR-BZtyTzyXaQlZ826ONJr0mMgbuZDabbfBbHtGE26E97Q7Lpm0DeaBRXBBFQLi13SAwDfqycJJoWXcuktmg414YnuZZcpjxRlkqTK9eBQxuG2AinxmfQLMrCnkPAtZC56OouW2LixJnSKVUWXcpUiqmXuYCWM3uxqqUxFhuLL__-fAeHw-l4tBg9Tp6v4Mg5oa66uoZm9bG2NxjfK3Xr3foNRfShMQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2022+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=End-to-End+Multi-Person+Pose+Estimation+with+Transformers&rft.au=Shi%2C+Dahu&rft.au=Wei%2C+Xing&rft.au=Li%2C+Liangqi&rft.au=Ren%2C+Ye&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=2575-7075&rft.spage=11059&rft.epage=11068&rft_id=info:doi/10.1109%2FCVPR52688.2022.01079&rft.externalDocID=9878630