Review of Image Captioning Methods Based on Encoding-Decoding Technology

In recent years, image caption generation, as a multimodal task in the field of artificial intelligence, integrates the related research of computer vision and natural language processing, and can realize the modal conversion from image to text. It plays an important role in visual assistance and im...

Full description

Saved in:

Bibliographic Details
Published in	Jisuanji kexue yu tansuo Vol. 16; no. 10; pp. 2234 - 2248
Main Author	GENG Yaogang, MEI Hongyan, ZHANG Xing, LI Xiaohui
Format	Journal Article
Language	Chinese
Published	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 01.10.2022
Subjects	image caption generation\|encode\|decode\|multimodal\|attention mechanism
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, image caption generation, as a multimodal task in the field of artificial intelligence, integrates the related research of computer vision and natural language processing, and can realize the modal conversion from image to text. It plays an important role in visual assistance and image understanding, and has attracted extensive attention from researchers. Firstly, this paper describes the task of image caption generation, and introduces three image caption generation methods: template-based method, retrieval-based method and encode-decode method. Their respective method ideas, representative research and advantages and disadvantages are also introduced. Secondly, from the model structure, the research progress of image understanding phase and caption generation phase, this paper expounds in detail the method based on encoding-decoding, and summarizes the research over years into the research of image understanding and caption generation. Image understanding research includes attention mechani
ISSN:	1673-9418
DOI:	10.3778/j.issn.1673-9418.2112080