Sports Video Captioning via Attentive Motion Representation and Group Relationship Modeling

Sports video captioning refers to the task of automatically generating a textual description for sports events (football, basketball, or volleyball games). Although a great deal of previous work has shown promising performance in producing a coarse and a general description of a video but lack of pr...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 30; no. 8; pp. 2617 - 2633
Main Authors	Qi, Mengshi, Wang, Yunhong, Li, Annan, Luo, Jiebo
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Datasets Games group relationship Logic gates Modelling Modules motion representation Professional sports Recurrent neural networks Representations RNN Semantics Sports Sports video Task analysis Trajectory video captioning Visualization Volleyball
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Sports video captioning refers to the task of automatically generating a textual description for sports events (football, basketball, or volleyball games). Although a great deal of previous work has shown promising performance in producing a coarse and a general description of a video but lack of professional sports knowledge, it is still quite challenging to caption a sports video with multiple fine-grained player's actions and complex group relationship between players. In this paper, we present a novel hierarchical recurrent neural network-based framework with an attention mechanism for sports video captioning, in which a motion representation module is proposed to capture individual pose attribute and dynamical trajectory cluster information with extra professional sports knowledge, and a group relationship module is employed to design a scene graph for modeling players' interaction by a gated graph convolutional network. Moreover, we introduce a new dataset called sports video captioning dataset-volleyball for evaluation. The proposed model is evaluated on three widely adopted public datasets and our collected new dataset, on which the effectiveness of our method is well demonstrated.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2019.2921655