An image caption method based on object detection

How to represent image information more effectively is the key to the task of image caption. In the existing research, a large number of image caption methods are proposed. Most of them use the global information of the image, and the information in the image that is not related to the caption gener...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 78; no. 24; pp. 35329 - 35350
Main Authors Cao, Danyang, Zhu, Menggui, Gao, Lei
Format Journal Article
LanguageEnglish
Published New York Springer US 01.12.2019
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:How to represent image information more effectively is the key to the task of image caption. In the existing research, a large number of image caption methods are proposed. Most of them use the global information of the image, and the information in the image that is not related to the caption generation also participates in the calculation, caused a certain amount of waste of resources. In order to solve this problem, a method of generating image caption based on object detection is proposed in this paper. Firstly, the object detection algorithm is used to extract image feature, only the features of meaningful regions in the image are used, and then image caption is generated by combining the spatial attention mechanism with the caption generation network. Experiments show that the image feature of the object region and the salient region are sufficient to represent the information of the entire image in the image caption task. For better convergence of the model, this paper also uses a new strategy for model training. The experimental results show that the proposed model in this paper work well on the test dataset of image caption, and it has created a precedent for new technology to a large extent.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-019-08116-9