SAMPose: Multi-Person Pose Estimation based on Segment Anything Model
Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for tr...
Saved in:
Published in | 2024 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
30.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them. |
---|---|
AbstractList | Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them. |
Author | Kong, Ruoshan Li, Jiechen Liu, Feng |
Author_xml | – sequence: 1 givenname: Jiechen surname: Li fullname: Li, Jiechen email: ljclll@whu.edu.cn organization: Wuhan University,School of Computer Science,Wuhan,China – sequence: 2 givenname: Ruoshan surname: Kong fullname: Kong, Ruoshan email: krs1024@126.com organization: Wuhan University,School of Computer Science,Wuhan,China – sequence: 3 givenname: Feng surname: Liu fullname: Liu, Feng email: fliuwhu@whu.edu.cn organization: Wuhan University,School of Computer Science,Wuhan,China |
BookMark | eNqFjssKwjAURK-i4PMPXOQHWm-SvuJOpKJCRdC9VLxqpE2liQv_3gq6djUzZ2CYAXRMZQiAcfQ5RzVdbxbbbYSJUr5AEfgcoxAx4S0Yq1glMkQZKslFG_qCR9wLAox7MLD2jiikUrIP6X6e7SpLM5Y9C6e9HdW2MuyDWGqdLnOnm3zKLZ1ZY_Z0Lck4Njcvd9PmyrLqTMUIupe8sDT-6hAmy_SwWHmaiI6PupmpX8ffPfmnfgN3vT88 |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/IJCNN60899.2024.10650081 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEL IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9798350359312 |
EISSN | 2161-4407 |
EndPage | 8 |
ExternalDocumentID | 10650081 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI OCL RIE RIL RIO RNS |
ID | FETCH-ieee_primary_106500813 |
IEDL.DBID | RIE |
IngestDate | Wed Sep 18 05:50:09 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-ieee_primary_106500813 |
ParticipantIDs | ieee_primary_10650081 |
PublicationCentury | 2000 |
PublicationDate | 2024-June-30 |
PublicationDateYYYYMMDD | 2024-06-30 |
PublicationDate_xml | – month: 06 year: 2024 text: 2024-June-30 day: 30 |
PublicationDecade | 2020 |
PublicationTitle | 2024 International Joint Conference on Neural Networks (IJCNN) |
PublicationTitleAbbrev | IJCNN |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0023993 |
Score | 3.8473926 |
Snippet | Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Accuracy Detectors Head Image segmentation Neural networks Pose estimation Training |
Title | SAMPose: Multi-Person Pose Estimation based on Segment Anything Model |
URI | https://ieeexplore.ieee.org/document/10650081 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB60J0_1EfFRZQ9eE2OyiY23UlJqoSFQhd5Kspn1oCRik4P-emc2TUVR8DYs7O6wy_DNzM63A3DleRHda4G2Hw6lLUMKUDJNEkGfF2EW-bnPbOR5Ek4f5WwZLDdkdcOFQURTfIYOi-Ytv6hUw6kysnDyJ1wmWu8OXa8la22jK0barlTHja7vZ-MkCflRi4JATzrd3G9dVAyITPqQdNu3tSPPTlPnjvr48TPjv_XbB-uLryfSLRIdwA6Wh9DvGjaIjf0eQbwYzdNqjXfC8G7t1LjbgodETLbe0hgFI1shSFjgE-8oRuV7zZkqwZ3TXiwYTOKH8dRm1Vav7XcVq04r_xh6ZVXiCQgK-4Is1ygDpaXSOWc11G2mNDmPOrpRp2D9usTZH-PnsMeH3NbRDaBXvzV4QWBd55fmkj4Bf7uW6g |
link.rule.ids | 310,311,786,790,795,796,802,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLbQOMBpPIp4DMiBa0vpC8Jtmjp1Y60qbUi7VW2acgC1E7QH-PXY6ToEAombFSmJlcj6bMdfDHBlWRzvNZe67d05uuNhgJIWKCH0WVym3M5sYiOHkRc8OtOlu1yT1RUXRkqpis-kQaJ6y88r0VCqDC0c_QmTiNbbCPQmb-lam_iKsLYr1jH59WQ6iiKPnrUwDLQco5v9rY-KgpFxH6JOgbZ65Nlo6swQHz_-Zvy3hnugfTH2WLzBon3YkuUB9LuWDWxtwYfgz4dhXL3Je6aYt3qsHG5GQ8xHa2-JjIywLWcozOUT7ciG5XtNuSpGvdNeNBiM_cUo0Em1ZNV-WJF0WtlH0CurUh4Dw8DPTbNCOq4oHFFklNcQt6ko0H0s-I04Ae3XJU7_GL-EnWARzpLZJHo4g1068LaqbgC9-rWR5wjddXahLuwTpdyaQA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+International+Joint+Conference+on+Neural+Networks+%28IJCNN%29&rft.atitle=SAMPose%3A+Multi-Person+Pose+Estimation+based+on+Segment+Anything+Model&rft.au=Li%2C+Jiechen&rft.au=Kong%2C+Ruoshan&rft.au=Liu%2C+Feng&rft.date=2024-06-30&rft.pub=IEEE&rft.eissn=2161-4407&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FIJCNN60899.2024.10650081&rft.externalDocID=10650081 |