SAMPose: Multi-Person Pose Estimation based on Segment Anything Model
Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for tr...
Saved in:
Published in | 2024 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
30.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recently, there have been numerous advances in the field of 2D human pose estimation. We observed that conventional methods generally use a human detector or rely on ground truth to get the bounding box of the human subject. The image cropped by the bounding box is then directly used as input for training. This may introduce irrelevant and distracting information, affecting the accuracy of prediction. Upon recognizing the newly proposed large vision model, the Segment Anything Model, we devised a method to leverage segmentation for improving the accuracy of pose estimation. Consequently, we introduce SAMPose, a multi-person pose estimation method that use the Segment Anything Model to improve performance. First, we use SAM to segment the person in the image, and then, this segmentation mask is used as an enhancement of the input image, it guide the network to be more aware of the person part in the picture. In the experiment, our SAMPose-m achieves a 62.3% Average Precision (AP) on the COCO-wholebody benchmark, representing a 4.1% improvement over the baseline. Our SAMPose-m beats the SOTA model DWPose in the same model size. Additionally, our model achieves a 75.4% AP on COCO-wholebody benchmark. Moreover, our model exhibits improvements of 1.4% and 1.9% on Crowdpose and MPII, respectively. Furthermore, we apply our SAM-based method to various other famous and milestone methods, such as HRNet, SimCC, and DeepPose, demonstrating improvement across all of them. |
---|---|
ISSN: | 2161-4407 |
DOI: | 10.1109/IJCNN60899.2024.10650081 |