Skeleton-based attention-aware spatial–temporal model for action detection and recognition

Action detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustnes...

Full description

Saved in:

Bibliographic Details
Published in	IET computer vision Vol. 14; no. 5; pp. 177 - 184
Main Authors	Cui, Ran, Zhu, Aichun, Wu, Jingran, Hua, Gang
Format	Journal Article
Language	English
Published	The Institution of Engineering and Technology 01.08.2020 Wiley
Subjects	action detection action features action location action recognition computer vision conditional random field loss function dynamic features human skeleton image motion analysis image recognition image representation recurrent neural nets recurrent neural network framework Research Article skeleton joints skeleton-based action analysis model skeleton-based attention-aware spatial–temporal model static features triple loss models action video signal processing human skeleton recurrent neural nets action location action detection skeleton-based action analysis model recurrent neural network framework dynamic features triple loss models action conditional random field loss function static features image motion analysis action features skeleton joints computer vision image representation action recognition video signal processing image recognition skeleton-based attention-aware spatial–temporal model
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Action detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustness against external factors and requiring a small amount of calculation. This study proposes a skeleton-based action analysis model based on a recurrent neural network framework. The model learns action features by modelling static and dynamic features of skeleton joints and the importance of different video frames by introducing an attention module. For action location, conditional random field loss function is introduced to establish the context dependency of output labels. In the aspect of action recognition, the hierarchical training mechanism with triple loss models action features at coarse-grained and fine-grained levels. The authors’ proposed method delivers state-of-the-art results on action location and recognition tasks.
ISSN:	1751-9632 1751-9640 1751-9640
DOI:	10.1049/iet-cvi.2019.0751