Syntax-Aware Action Targeting for Video Captioning

Existing methods on video captioning have made great efforts to identify objects/instances in videos, but few of them emphasize the prediction of action. As a result, the learned models are likely to depend heavily on the prior of training data, such as the co-occurrence of objects, which may cause...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 13093 - 13102
Main Authors	Zheng, Qi, Wang, Chaoyue, Tao, Dacheng
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2020
Subjects	Automobiles Decoding Feature extraction Object recognition Semantics Syntactics Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Existing methods on video captioning have made great efforts to identify objects/instances in videos, but few of them emphasize the prediction of action. As a result, the learned models are likely to depend heavily on the prior of training data, such as the co-occurrence of objects, which may cause an enormous divergence between the generated descriptions and the video content. In this paper, we explicitly emphasize the importance of \textit{action} by predicting visually-related syntax components including \textit{subject}, \textit{object} and \textit{predicate}. Specifically, we propose a Syntax-Aware Action Targeting (SAAT) module that firstly builds a self-attended scene representation to draw global dependence among multiple objects within a scene, and then decodes the visually-related syntax components by setting different queries. After targeting the \textit{action}, indicated by \textit{predicate}, our captioner learns an attention distribution over the \textit{predicate} and the previously predicted words to guide the generation of the next word. Comprehensive experiments on MSVD and MSR-VTT datasets demonstrate the efficacy of the proposed model.
ISSN:	1063-6919
DOI:	10.1109/CVPR42600.2020.01311