Task-Adaptive Attention for Image Captioning

Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could mislead the word generation. In this paper, we prop...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 1; pp. 43 - 51
Main Authors	Yan, Chenggang, Hao, Yiming, Li, Liang, Yin, Jian, Liu, Anan, Mao, Zhendong, Chen, Zhenyu, Gao, Xingyu
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation models attention mechanism Computational modeling Decoding Feature extraction Feeds Image captioning Modules Regularization Task analysis transformer Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could mislead the word generation. In this paper, we propose Task-Adaptive Attention module for image captioning, which can alleviate this misleading problem and learn implicit non-visual clues which can be helpful for the generation of non-visual words. We further introduce a diversity regularization to enhance the expression ability of the Task-Adaptive Attention module. Extensive experiments on the MSCOCO captioning dataset demonstrate that by plugging our Task-Adaptive Attention module into a vanilla Transformer-based image captioning model, performance improvement can be achieved.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2021.3067449