HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution featur...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 5385 - 5394
Main Authors	Cheng, Bowen, Xiao, Bin, Wang, Jingdong, Shi, Honghui, Huang, Thomas S., Zhang, Lei
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2020
Subjects	Convolution Deconvolution Heating systems Pose estimation Spatial resolution Training
Online Access	Get full text
ISSN	1063-6919
DOI	10.1109/CVPR42600.2020.00543

Cover

Loading…

More Information
Summary:	Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene.
ISSN:	1063-6919
DOI:	10.1109/CVPR42600.2020.00543