Image caption generation with high-level image features
•Introduce the theory of attention in psychology to image captioning and use to filter image features.•Combine low-level information with high-level features to detect attention regions of an image.•LSTM variant model is not only affected by long-term information, but also by the rules of attention....
Saved in:
Published in | Pattern recognition letters Vol. 123; pp. 89 - 95 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Amsterdam
Elsevier B.V
15.05.2019
Elsevier Science Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Introduce the theory of attention in psychology to image captioning and use to filter image features.•Combine low-level information with high-level features to detect attention regions of an image.•LSTM variant model is not only affected by long-term information, but also by the rules of attention.•Quantitatively validate good performance of our method on some benchmark datasets.
Recently, caption generation has raised a huge interests in images and videos. However, it is challenging for the models to select proper subjects in a complex background and generate desired captions in high-level vision tasks. Inspired by recent works, we propose a novel image captioning model based on high-level image features. We combine low-level information, such as image quality, with high-level features, such as motion classification and face recognition to detect attention regions of an image. We demonstrate that our attention model produces good performance in experiments on MSCOCO, Flickr 30K, PASCL and SBU datasets. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0167-8655 1872-7344 |
DOI: | 10.1016/j.patrec.2019.03.021 |