A neural image captioning model with caption-to-images semantic constructor

The current dominant image captioning models are mostly based on a CNN-LSTM encoder-decoder framework. Although this architecture has achieved remarkable progress, it still has shortcomings for not fully capturing the encoded image information. Specifically, the model only exploits image-to-caption...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 367; pp. 144 - 151
Main Authors	Su, Jinsong, Tang, Jialong, Lu, Ziyao, Han, Xianpei, Zhang, Haiying
Format	Journal Article
Language	English
Published	Elsevier B.V 20.11.2019
Subjects	Image captioning Reranking Semantic reconstructor Reranking Semantic reconstructor Image captioning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The current dominant image captioning models are mostly based on a CNN-LSTM encoder-decoder framework. Although this architecture has achieved remarkable progress, it still has shortcomings for not fully capturing the encoded image information. Specifically, the model only exploits image-to-caption dependency during the process of caption generation. In this paper, we extend the conventional CNN-LSTM image captioning model by introducing a caption-to-images semantic reconstructor, which reconstructs the semantic representations of the input image and its similar images from hidden states of the decoder. Serving as an auxiliary objective that evaluates the fidelity of the generated caption, the reconstruction score of semantic reconstructor is combined with the likelihood to refine model training. In this way, semantics of input image can be more effectively transferred to the decoder and be fully exploited to generate better captions. Besides, during model testing, the reconstruction score can be used along with log likelihood to select better caption via reranking. Experimental results show that the proposed model significantly improves the quality of the generated captions and outperforms a conventional image captioning model, LSTM-A5.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2019.08.012