Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks

Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, long short-term memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 27; no. 4; pp. 1586 - 1599
Main Authors	Jun Liu, Gang Wang, Ling-Yu Duan, Abdiyeva, Kamila, Kot, Alex C.
Format	Journal Article
Language	English
Published	United States IEEE 01.04.2018
Subjects	Action recognition Algorithms attention Context modeling Databases, Factual Feature extraction global context memory Hidden Markov models Human Activities - classification Humans Logic gates long short-term memory Machine Learning Models, Neurological Neural Networks, Computer Pattern Recognition, Automated - methods Reliability Skeleton skeleton sequence Three-dimensional displays
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, long short-term memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, global context-aware attention LSTM, for skeleton-based action recognition, which is capable of selectively focusing on the informative joints in each frame by using a global context memory cell. To further improve the attention capability, we also introduce a recurrent attention mechanism, with which the attention performance of our network can be enhanced progressively. Besides, a two-stream framework, which leverages coarse-grained attention and fine-grained attention, is also introduced. The proposed method achieves state-of-the-art performance on five challenging datasets for skeleton-based action recognition.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2017.2785279