Three-Dimensional Human Pose Estimation from Sparse IMUs through Temporal Encoder and Regression Decoder

Three-dimensional (3D) pose estimation has been widely used in many three-dimensional human motion analysis applications, where inertia-based path estimation is gradually being adopted. Systems based on commercial inertial measurement units (IMUs) usually rely on dense and complex wearable sensors a...

Full description

Saved in:

Bibliographic Details
Published in	Sensors (Basel, Switzerland) Vol. 23; no. 7; p. 3547
Main Authors	Liao, Xianhua, Dong, Jiayan, Song, Kangkang, Xiao, Jiangjian
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 28.03.2023 MDPI
Subjects	Algorithms Analysis Biomechanical Phenomena Calibration Datasets Deep learning encoder–decoder Human body human kinematics hierarchy Human motion Humans Inertial platforms Kinematics Motion Motion capture Movement Neural networks Neural Networks, Computer Performance evaluation Pose estimation Position errors regression decoder Sensors Shaking sparse IMUs Teaching methods temporal convolutional encoder Three dimensional analysis Three dimensional motion three-dimensional human pose Topology regression decoder three-dimensional human pose human kinematics hierarchy temporal convolutional encoder encoder–decoder sparse IMUs
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Three-dimensional (3D) pose estimation has been widely used in many three-dimensional human motion analysis applications, where inertia-based path estimation is gradually being adopted. Systems based on commercial inertial measurement units (IMUs) usually rely on dense and complex wearable sensors and time-consuming calibration, causing intrusions to the subject and hindering free body movement. The sparse IMUs-based method has drawn research attention recently. Existing sparse IMUs-based three-dimensional pose estimation methods use neural networks to obtain human poses from temporal feature information. However, these methods still suffer from issues, such as body shaking, body tilt, and movement ambiguity. This paper presents an approach to improve three-dimensional human pose estimation by fusing temporal and spatial features. Based on a multistage encoder-decoder network, a temporal convolutional encoder and human kinematics regression decoder were designed. The final three-dimensional pose was predicted from the temporal feature information and human kinematic feature information. Extensive experiments were conducted on two benchmark datasets for three-dimensional human pose estimation. Compared to state-of-the-art methods, the mean per joint position error was decreased by 13.6% and 19.4% on the total capture and DIP-IMU datasets, respectively. The quantitative comparison demonstrates that the proposed temporal information and human kinematic topology can improve pose accuracy.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s23073547