SAMPose: Multi-Person Pose Estimation based on Segment Anything Model

Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for tr...

Full description

Saved in:

Bibliographic Details
Published in	2024 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors	Li, Jiechen, Kong, Ruoshan, Liu, Feng
Format	Conference Proceeding
Language	English
Published	IEEE 30.06.2024
Subjects	Accuracy Detectors Head Image segmentation Neural networks Pose estimation Training
Online Access	Get full text

Cover

Loading…

Abstract	Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them.
AbstractList	Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them.
Author	Kong, Ruoshan Li, Jiechen Liu, Feng
Author_xml	– sequence: 1 givenname: Jiechen surname: Li fullname: Li, Jiechen email: ljclll@whu.edu.cn organization: Wuhan University,School of Computer Science,Wuhan,China – sequence: 2 givenname: Ruoshan surname: Kong fullname: Kong, Ruoshan email: krs1024@126.com organization: Wuhan University,School of Computer Science,Wuhan,China – sequence: 3 givenname: Feng surname: Liu fullname: Liu, Feng email: fliuwhu@whu.edu.cn organization: Wuhan University,School of Computer Science,Wuhan,China
BookMark	eNqFjssKwjAURK-i4PMPXOQHWm-SvuJOpKJCRdC9VLxqpE2liQv_3gq6djUzZ2CYAXRMZQiAcfQ5RzVdbxbbbYSJUr5AEfgcoxAx4S0Yq1glMkQZKslFG_qCR9wLAox7MLD2jiikUrIP6X6e7SpLM5Y9C6e9HdW2MuyDWGqdLnOnm3zKLZ1ZY_Z0Lck4Njcvd9PmyrLqTMUIupe8sDT-6hAmy_SwWHmaiI6PupmpX8ffPfmnfgN3vT88
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/IJCNN60899.2024.10650081
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEL IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEL url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798350359312
EISSN	2161-4407
EndPage	8
ExternalDocumentID	10650081
Genre	orig-research
GroupedDBID	6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI OCL RIE RIL RIO RNS
ID	FETCH-ieee_primary_106500813
IEDL.DBID	RIE
IngestDate	Wed Sep 18 05:50:09 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-ieee_primary_106500813
ParticipantIDs	ieee_primary_10650081
PublicationCentury	2000
PublicationDate	2024-June-30
PublicationDateYYYYMMDD	2024-06-30
PublicationDate_xml	– month: 06 year: 2024 text: 2024-June-30 day: 30
PublicationDecade	2020
PublicationTitle	2024 International Joint Conference on Neural Networks (IJCNN)
PublicationTitleAbbrev	IJCNN
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0023993
Score	3.8473926
Snippet	Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Accuracy Detectors Head Image segmentation Neural networks Pose estimation Training
Title	SAMPose: Multi-Person Pose Estimation based on Segment Anything Model
URI	https://ieeexplore.ieee.org/document/10650081
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB60J0_1EfFRZQ9eE2OyiY23UlJqoSFQhd5Kspn1oCRik4P-emc2TUVR8DYs7O6wy_DNzM63A3DleRHda4G2Hw6lLUMKUDJNEkGfF2EW-bnPbOR5Ek4f5WwZLDdkdcOFQURTfIYOi-Ytv6hUw6kysnDyJ1wmWu8OXa8la22jK0barlTHja7vZ-MkCflRi4JATzrd3G9dVAyITPqQdNu3tSPPTlPnjvr48TPjv_XbB-uLryfSLRIdwA6Wh9DvGjaIjf0eQbwYzdNqjXfC8G7t1LjbgodETLbe0hgFI1shSFjgE-8oRuV7zZkqwZ3TXiwYTOKH8dRm1Vav7XcVq04r_xh6ZVXiCQgK-4Is1ygDpaXSOWc11G2mNDmPOrpRp2D9usTZH-PnsMeH3NbRDaBXvzV4QWBd55fmkj4Bf7uW6g
link.rule.ids	310,311,786,790,795,796,802,27958,55109
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLbQOMBpPIp4DMiBa0vpC8Jtmjp1Y60qbUi7VW2acgC1E7QH-PXY6ToEAombFSmJlcj6bMdfDHBlWRzvNZe67d05uuNhgJIWKCH0WVym3M5sYiOHkRc8OtOlu1yT1RUXRkqpis-kQaJ6y88r0VCqDC0c_QmTiNbbCPQmb-lam_iKsLYr1jH59WQ6iiKPnrUwDLQco5v9rY-KgpFxH6JOgbZ65Nlo6swQHz_-Zvy3hnugfTH2WLzBon3YkuUB9LuWDWxtwYfgz4dhXL3Je6aYt3qsHG5GQ8xHa2-JjIywLWcozOUT7ciG5XtNuSpGvdNeNBiM_cUo0Em1ZNV-WJF0WtlH0CurUh4Dw8DPTbNCOq4oHFFklNcQt6ko0H0s-I04Ae3XJU7_GL-EnWARzpLZJHo4g1068LaqbgC9-rWR5wjddXahLuwTpdyaQA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+International+Joint+Conference+on+Neural+Networks+%28IJCNN%29&rft.atitle=SAMPose%3A+Multi-Person+Pose+Estimation+based+on+Segment+Anything+Model&rft.au=Li%2C+Jiechen&rft.au=Kong%2C+Ruoshan&rft.au=Liu%2C+Feng&rft.date=2024-06-30&rft.pub=IEEE&rft.eissn=2161-4407&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FIJCNN60899.2024.10650081&rft.externalDocID=10650081