Image caption generation with high-level image features

•Introduce the theory of attention in psychology to image captioning and use to filter image features.•Combine low-level information with high-level features to detect attention regions of an image.•LSTM variant model is not only affected by long-term information, but also by the rules of attention....

Full description

Saved in:
Bibliographic Details
Published inPattern recognition letters Vol. 123; pp. 89 - 95
Main Authors Ding, Songtao, Qu, Shiru, Xi, Yuling, Sangaiah, Arun Kumar, Wan, Shaohua
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 15.05.2019
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Introduce the theory of attention in psychology to image captioning and use to filter image features.•Combine low-level information with high-level features to detect attention regions of an image.•LSTM variant model is not only affected by long-term information, but also by the rules of attention.•Quantitatively validate good performance of our method on some benchmark datasets. Recently, caption generation has raised a huge interests in images and videos. However, it is challenging for the models to select proper subjects in a complex background and generate desired captions in high-level vision tasks. Inspired by recent works, we propose a novel image captioning model based on high-level image features. We combine low-level information, such as image quality, with high-level features, such as motion classification and face recognition to detect attention regions of an image. We demonstrate that our attention model produces good performance in experiments on MSCOCO, Flickr 30K, PASCL and SBU datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2019.03.021