YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a s...

Full description

Saved in:

Bibliographic Details
Published in	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp. 2636 - 2645
Main Authors	Maji, Debapriya, Nagori, Soyeb, Mathew, Manu, Poddar, Deepak
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2022
Subjects	Conferences Heating systems Measurement Object detection Pose estimation Skeleton Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2Dposes in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the post- processing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be madepublicly available at https://github.com/TexasInstruments/edgeai-yolov5 https://github.com/TexasInstruments/edgeai-yolox
ISSN:	2160-7516
DOI:	10.1109/CVPRW56347.2022.00297