Image caption generation with high-level image features

•Introduce the theory of attention in psychology to image captioning and use to filter image features.•Combine low-level information with high-level features to detect attention regions of an image.•LSTM variant model is not only affected by long-term information, but also by the rules of attention....

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 123; pp. 89 - 95
Main Authors	Ding, Songtao, Qu, Shiru, Xi, Yuling, Sangaiah, Arun Kumar, Wan, Shaohua
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 15.05.2019 Elsevier Science Ltd
Subjects	Bottom-up attention mechanism Face recognition Faster R-CNN Image captioning Image classification Image detection Image quality Language model Object recognition Pattern recognition Task complexity Language model Image captioning Faster R-CNN Bottom-up attention mechanism
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Introduce the theory of attention in psychology to image captioning and use to filter image features.•Combine low-level information with high-level features to detect attention regions of an image.•LSTM variant model is not only affected by long-term information, but also by the rules of attention.•Quantitatively validate good performance of our method on some benchmark datasets. Recently, caption generation has raised a huge interests in images and videos. However, it is challenging for the models to select proper subjects in a complex background and generate desired captions in high-level vision tasks. Inspired by recent works, we propose a novel image captioning model based on high-level image features. We combine low-level information, such as image quality, with high-level features, such as motion classification and face recognition to detect attention regions of an image. We demonstrate that our attention model produces good performance in experiments on MSCOCO, Flickr 30K, PASCL and SBU datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2019.03.021