A hierarchical recurrent approach to predict scene graphs from a visual‐attention‐oriented perspective

A scene graph provides a powerful intermediate knowledge structure for various visual tasks, including semantic image retrieval, image captioning, and visual question answering. In this paper, the task of predicting a scene graph for an image is formulated as two connected problems, ie, recognizing...

Full description

Saved in:
Bibliographic Details
Published inComputational intelligence Vol. 35; no. 3; pp. 496 - 516
Main Authors Gao, Wenjing, Zhu, Yonghua, Zhang, Wenjun, Zhang, Ke, Gao, Honghao
Format Journal Article
LanguageEnglish
Published Hoboken Blackwell Publishing Ltd 01.08.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A scene graph provides a powerful intermediate knowledge structure for various visual tasks, including semantic image retrieval, image captioning, and visual question answering. In this paper, the task of predicting a scene graph for an image is formulated as two connected problems, ie, recognizing the relationship triplets, structured as <subject‐predicate‐object>, and constructing the scene graph from the recognized relationship triplets. For relationship triplet recognition, we develop a novel hierarchical recurrent neural network with visual attention mechanism. This model is composed of two attention‐based recurrent neural networks in a hierarchical organization. The first network generates a topic vector for each relationship triplet, whereas the second network predicts each word in that relationship triplet given the topic vector. This approach successfully captures the compositional structure and contextual dependency of an image and the relationship triplets describing its scene. For scene graph construction, an entity localization approach to determine the graph structure is presented with the assistance of available attention information. Then, the procedures for automatically converting the generated relationship triplets into a scene graph are clarified through an algorithm. Extensive experimental results on two widely used data sets verify the feasibility of the proposed approach.
Bibliography:Honghao Gao, Shanghai Film Academy, Shanghai University, Shanghai, China
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0824-7935
1467-8640
DOI:10.1111/coin.12202