SAMPose: Multi-Person Pose Estimation based on Segment Anything Model

Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for tr...

Full description

Saved in:
Bibliographic Details
Published in2024 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors Li, Jiechen, Kong, Ruoshan, Liu, Feng
Format Conference Proceeding
LanguageEnglish
Published IEEE 30.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them.
AbstractList Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them.
Author Kong, Ruoshan
Li, Jiechen
Liu, Feng
Author_xml – sequence: 1
  givenname: Jiechen
  surname: Li
  fullname: Li, Jiechen
  email: ljclll@whu.edu.cn
  organization: Wuhan University,School of Computer Science,Wuhan,China
– sequence: 2
  givenname: Ruoshan
  surname: Kong
  fullname: Kong, Ruoshan
  email: krs1024@126.com
  organization: Wuhan University,School of Computer Science,Wuhan,China
– sequence: 3
  givenname: Feng
  surname: Liu
  fullname: Liu, Feng
  email: fliuwhu@whu.edu.cn
  organization: Wuhan University,School of Computer Science,Wuhan,China
BookMark eNqFjssKwjAURK-i4PMPXOQHWm-SvuJOpKJCRdC9VLxqpE2liQv_3gq6djUzZ2CYAXRMZQiAcfQ5RzVdbxbbbYSJUr5AEfgcoxAx4S0Yq1glMkQZKslFG_qCR9wLAox7MLD2jiikUrIP6X6e7SpLM5Y9C6e9HdW2MuyDWGqdLnOnm3zKLZ1ZY_Z0Lck4Njcvd9PmyrLqTMUIupe8sDT-6hAmy_SwWHmaiI6PupmpX8ffPfmnfgN3vT88
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/IJCNN60899.2024.10650081
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEL
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350359312
EISSN 2161-4407
EndPage 8
ExternalDocumentID 10650081
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
OCL
RIE
RIL
RIO
RNS
ID FETCH-ieee_primary_106500813
IEDL.DBID RIE
IngestDate Wed Sep 18 05:50:09 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_106500813
ParticipantIDs ieee_primary_10650081
PublicationCentury 2000
PublicationDate 2024-June-30
PublicationDateYYYYMMDD 2024-06-30
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-30
  day: 30
PublicationDecade 2020
PublicationTitle 2024 International Joint Conference on Neural Networks (IJCNN)
PublicationTitleAbbrev IJCNN
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0023993
Score 3.8473926
Snippet Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Detectors
Head
Image segmentation
Neural networks
Pose estimation
Training
Title SAMPose: Multi-Person Pose Estimation based on Segment Anything Model
URI https://ieeexplore.ieee.org/document/10650081
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB60J0_1EfFRZQ9eE2OyiY23UlJqoSFQhd5Kspn1oCRik4P-emc2TUVR8DYs7O6wy_DNzM63A3DleRHda4G2Hw6lLUMKUDJNEkGfF2EW-bnPbOR5Ek4f5WwZLDdkdcOFQURTfIYOi-Ytv6hUw6kysnDyJ1wmWu8OXa8la22jK0barlTHja7vZ-MkCflRi4JATzrd3G9dVAyITPqQdNu3tSPPTlPnjvr48TPjv_XbB-uLryfSLRIdwA6Wh9DvGjaIjf0eQbwYzdNqjXfC8G7t1LjbgodETLbe0hgFI1shSFjgE-8oRuV7zZkqwZ3TXiwYTOKH8dRm1Vav7XcVq04r_xh6ZVXiCQgK-4Is1ygDpaXSOWc11G2mNDmPOrpRp2D9usTZH-PnsMeH3NbRDaBXvzV4QWBd55fmkj4Bf7uW6g
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLbQOMBpPIp4DMiBa0vpC8Jtmjp1Y60qbUi7VW2acgC1E7QH-PXY6ToEAombFSmJlcj6bMdfDHBlWRzvNZe67d05uuNhgJIWKCH0WVym3M5sYiOHkRc8OtOlu1yT1RUXRkqpis-kQaJ6y88r0VCqDC0c_QmTiNbbCPQmb-lam_iKsLYr1jH59WQ6iiKPnrUwDLQco5v9rY-KgpFxH6JOgbZ65Nlo6swQHz_-Zvy3hnugfTH2WLzBon3YkuUB9LuWDWxtwYfgz4dhXL3Je6aYt3qsHG5GQ8xHa2-JjIywLWcozOUT7ciG5XtNuSpGvdNeNBiM_cUo0Em1ZNV-WJF0WtlH0CurUh4Dw8DPTbNCOq4oHFFklNcQt6ko0H0s-I04Ae3XJU7_GL-EnWARzpLZJHo4g1068LaqbgC9-rWR5wjddXahLuwTpdyaQA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+International+Joint+Conference+on+Neural+Networks+%28IJCNN%29&rft.atitle=SAMPose%3A+Multi-Person+Pose+Estimation+based+on+Segment+Anything+Model&rft.au=Li%2C+Jiechen&rft.au=Kong%2C+Ruoshan&rft.au=Liu%2C+Feng&rft.date=2024-06-30&rft.pub=IEEE&rft.eissn=2161-4407&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FIJCNN60899.2024.10650081&rft.externalDocID=10650081