SAMPose: Multi-Person Pose Estimation based on Segment Anything Model

Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for tr...

Full description

Saved in:

Bibliographic Details
Published in	2024 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors	Li, Jiechen, Kong, Ruoshan, Liu, Feng
Format	Conference Proceeding
Language	English
Published	IEEE 30.06.2024
Subjects	Accuracy Detectors Head Image segmentation Neural networks Pose estimation Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them.
ISSN:	2161-4407
DOI:	10.1109/IJCNN60899.2024.10650081