Scene-Graph-Guided message passing network for dense captioning

•We propose to leverage the rich visual concepts and the structured knowledge for dense caption generation.•We use the objective function of scene graph generation to propagate the structured knowledge through the refining pipeline.•Expcrimental results and qualitative experiments confirm the cffect...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 145; pp. 187 - 193
Main Authors	Liu, An-An, Wang, Yanhui, Xu, Ning, Liu, Shan, Li, Xuanya
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.05.2021 Elsevier Science Ltd
Subjects	Combinatorial analysis Dense captioning Genomes Message passing Scene graph Dense captioning Scene graph Message passing 41A10 65D05 65D17 41A05
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•We propose to leverage the rich visual concepts and the structured knowledge for dense caption generation.•We use the objective function of scene graph generation to propagate the structured knowledge through the refining pipeline.•Expcrimental results and qualitative experiments confirm the cffect of our model. Dense captioning task aims to both localize and describe salient regions in images with natural languages. It can benefit from the rich visual concepts, including objects, pair-wise relationships and so on. However, due to the challenging combinatorial complexity of formulating <subject-predicate-object> triplets, very little work has been done to integrate them into the dense captioning task. Inspired by the recent success in scene graph generation for object and relationship detections, we propose a scene-graph-guided message passing network for dense caption generation. We first exploit message passing between objects and their relationships with a feature refining structure. Moreover, we formulate the message passing as the inter-connected visual concept generation problem while the objective function of scene graph generation is used to guide the region feature learning. Scene graph guide can propagate the structured knowledge of graph through the concept-region message passing mechanism (CR-MPM), which can improve the regional feature representation. Finally, the refined regional features are encoded by a LSTM-based decoder to generate dense captions. Our model can achieve competing performances on Visual Genome comparing against existing methods. Qualitative experiments also confirm the effect of our model in the dense captioning task.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2021.01.024