Deep State-Space Model for Noise Tolerant Skeleton-Based Action Recognition

Action recognition using skeleton data (3D coordinates of human joints) is an attractive topic due to its robustness to the actor's appearance, camera's viewpoint, illumination, and other environmental conditions. However, skeleton data must be measured by a depth sensor or extracted from...

Full description

Saved in:

Bibliographic Details
Published in	IEICE Transactions on Information and Systems Vol. E103.D; no. 6; pp. 1217 - 1225
Main Authors	KAWAMURA, Kazuki, MATSUBARA, Takashi, UEHARA, Kuniaki
Format	Journal Article
Language	English
Published	Tokyo The Institute of Electronics, Information and Communication Engineers 01.06.2020 Japan Science and Technology Agency
Subjects	action recognition Algorithms deep learning Feature extraction Noise Recognition Robustness skeleton State space models Video data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Action recognition using skeleton data (3D coordinates of human joints) is an attractive topic due to its robustness to the actor's appearance, camera's viewpoint, illumination, and other environmental conditions. However, skeleton data must be measured by a depth sensor or extracted from video data using an estimation algorithm, and doing so risks extraction errors and noise. In this work, for robust skeleton-based action recognition, we propose a deep state-space model (DSSM). The DSSM is a deep generative model of the underlying dynamics of an observable sequence. We applied the proposed DSSM to skeleton data, and the results demonstrate that it improves the classification performance of a baseline method. Moreover, we confirm that feature extraction with the proposed DSSM renders subsequent classifications robust to noise and missing values. In such experimental settings, the proposed DSSM outperforms a state-of-the-art method.
ISSN:	0916-8532 1745-1361
DOI:	10.1587/transinf.2019MVP0012