An image caption method based on object detection

How to represent image information more effectively is the key to the task of image caption. In the existing research, a large number of image caption methods are proposed. Most of them use the global information of the image, and the information in the image that is not related to the caption gener...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 78; no. 24; pp. 35329 - 35350
Main Authors	Cao, Danyang, Zhu, Menggui, Gao, Lei
Format	Journal Article
Language	English
Published	New York Springer US 01.12.2019 Springer Nature B.V
Subjects	Algorithms Computer Communication Networks Computer Science Data Structures and Information Theory Deep learning Feature extraction Image detection Machine translation Multimedia Multimedia Information Systems Natural language Neural networks New technology Object recognition Special Purpose and Application-Based Systems Teaching methods Deep learning Image caption Attention mechanism Object detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	How to represent image information more effectively is the key to the task of image caption. In the existing research, a large number of image caption methods are proposed. Most of them use the global information of the image, and the information in the image that is not related to the caption generation also participates in the calculation, caused a certain amount of waste of resources. In order to solve this problem, a method of generating image caption based on object detection is proposed in this paper. Firstly, the object detection algorithm is used to extract image feature, only the features of meaningful regions in the image are used, and then image caption is generated by combining the spatial attention mechanism with the caption generation network. Experiments show that the image feature of the object region and the salient region are sufficient to represent the information of the entire image in the image caption task. For better convergence of the model, this paper also uses a new strategy for model training. The experimental results show that the proposed model in this paper work well on the test dataset of image caption, and it has created a precedent for new technology to a large extent.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-019-08116-9