Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates

Skeleton-based human action recognition has attracted a lot of research attention during the past few years. Recent works attempted to utilize recurrent neural networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activ...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 40; no. 12; pp. 3007 - 3021
Main Authors	Liu, Jun, Shahroudy, Amir, Xu, Dong, Kot, Alex C., Wang, Gang
Format	Journal Article
Language	English
Published	United States IEEE 01.12.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Action recognition Computer memory Domains Feature extraction Hidden Markov models Human motion Logic gates long short-term memory Moving object recognition Network reliability Recognition Recurrent neural networks skeleton sequence spatio-temporal analysis Spatiotemporal phenomena Three dimensional models Three-dimensional displays tree traversal trust gate
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Skeleton-based human action recognition has attracted a lot of research attention during the past few years. Recent works attempted to utilize recurrent neural networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the skeletal data. The proposed work extends this idea to spatial domain as well as temporal domain to better analyze the hidden sources of action-related information within the human skeleton sequences in both of these domains simultaneously. Based on the pictorial structure of Kinect's skeletal data, an effective tree-structure based traversal framework is also proposed. In order to deal with the noise in the skeletal data, a new gating mechanism within LSTM module is introduced, with which the network can learn the reliability of the sequential data and accordingly adjust the effect of the input data on the updating procedure of the long-term context representation stored in the unit's memory cell. Moreover, we introduce a novel multi-modal feature fusion strategy within the LSTM unit in this paper. The comprehensive experimental results on seven challenging benchmark datasets for human action recognition demonstrate the effectiveness of the proposed method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2017.2771306