stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition

In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-tempora...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 30; no. 2; pp. 549 - 565
Main Authors	Qi, Mengshi, Wang, Yunhong, Qin, Jie, Li, Annan, Luo, Jiebo, Van Gool, Luc
Format	Journal Article
Language	English
Published	New York IEEE 01.02.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Action Recognition Activity recognition Adaptation models Group Activity Recognition Hidden Markov models Message passing Performance evaluation Recurrent neural networks RNN Scene Understanding Semantic Graph Semantics Spatio-temporal Attention Sports Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2019.2894161