Video Captioning Using Global-Local Representation

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room for improvement. In this work, we...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 10; pp. 6642 - 6656
Main Authors	Yan, Liqi, Ma, Siqi, Wang, Qifan, Chen, Yingjie, Zhang, Xiangyu, Savakis, Andreas, Liu, Dongfang
Format	Journal Article
Language	English
Published	United States IEEE 01.10.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Coders Computer vision Correlation Decoding natural language processing Representations Semantics Task analysis Training video captioning video representation Vision visual analysis Visualization Vocabulary Computer vision video representation natural language processing visual analysis video captioning
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!