Unified End-to-End YOLOv5-HR-TCM Framework for Automatic 2D/3D Human Pose Estimation for Real-Time Applications

Three-dimensional human pose estimation is widely applied in sports, robotics, and healthcare. In the past five years, the number of CNN-based studies for 3D human pose estimation has been numerous and has yielded impressive results. However, studies often focus only on improving the accuracy of the...

Full description

Saved in:

Bibliographic Details
Published in	Sensors (Basel, Switzerland) Vol. 22; no. 14; p. 5419
Main Authors	Nguyen, Hung-Cuong, Nguyen, Thi-Hao, Scherer, Rafal, Le, Van-Hung
Format	Journal Article
Language	English
Published	Basel MDPI AG 20.07.2022 MDPI
Subjects	2D/3D human pose estimation Accuracy Best practice Convolutional Neural Network Datasets Deviation Estimation Methods Pose estimation Pose-based Sports Application Sports Temporal Convolution Model YOLOv5
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Three-dimensional human pose estimation is widely applied in sports, robotics, and healthcare. In the past five years, the number of CNN-based studies for 3D human pose estimation has been numerous and has yielded impressive results. However, studies often focus only on improving the accuracy of the estimation results. In this paper, we propose a fast, unified end-to-end model for estimating 3D human pose, called YOLOv5-HR-TCM (YOLOv5-HRet-Temporal Convolution Model). Our proposed model is based on the 2D to 3D lifting approach for 3D human pose estimation while taking care of each step in the estimation process, such as person detection, 2D human pose estimation, and 3D human pose estimation. The proposed model is a combination of best practices at each stage. Our proposed model is evaluated on the Human 3.6M dataset and compared with other methods at each step. The method achieves high accuracy, not sacrificing processing speed. The estimated time of the whole process is 3.146 FPS on a low-end computer. In particular, we propose a sports scoring application based on the deviation angle between the estimated 3D human posture and the standard (reference) origin. The average deviation angle evaluated on the Human 3.6M dataset (Protocol #1–Pro #1) is 8.2 degrees.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s22145419