Distribution-aware network with context and entity attention for scene graph generation

Scene Graph Generation (SGG) aims to detect objects and infer their pairwise relationships in images, forming a structured semantic graph. Despite recent advances, existing methods struggle with effectively capturing rich contextual dependencies and suffer from biased relation prediction due to long...

Full description

Saved in:
Bibliographic Details
Published inEngineering applications of artificial intelligence Vol. 160; p. 111984
Main Authors Pan, Tongling, Wang, Lulu, Zhang, Ruoyu, Yu, Zhengtao, Li, Yingna
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 27.11.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Scene Graph Generation (SGG) aims to detect objects and infer their pairwise relationships in images, forming a structured semantic graph. Despite recent advances, existing methods struggle with effectively capturing rich contextual dependencies and suffer from biased relation prediction due to long-tail data distributions. To address these challenges, we propose a novel SGG framework, named DANCE, which integrates three key modules: the Context-Augmented Message Passing (CAMP) module, the Distribution-aware Dynamic Weighted Loss (DDW-Loss), and the Dual-Entity Attention Enhancement (DEAE) mechanism. Specifically, the CAMP module leverages a Gated Recurrent Unit (GRU) based architecture to reason over the scene graph, capturing entity interactions through sequential context encoding. To mitigate noise from random graph connections, we incorporate multi-head attention into GRU and fuse its output with initial visual features via residual connections. This design enhances context propagation while maintaining stability. Furthermore, the DDW-Loss function dynamically adjusts the loss weights of relation categories based on their frequency distribution, thus improving the learning of semantically meaningful but infrequent relations. Finally, the DEAE module employs parallel multi-head attention over subject and object features, enabling the model to extract fine-grained semantic dependencies and generate more discriminative relational embeddings. Experimental results on three popular datasets demonstrate that our method significantly improves the performance of scene graph generation tasks and outperforms existing methods. In addition, the proposed method exhibits strong adaptability and robustness in real-world scenarios characterized by complex scenes, noisy inputs, and varying image resolutions, highlighting its superior generalization capability for practical deployment. •GRU-based graph reasoning with multi-head attention and residuals for stability.•Distribution-aware dynamic loss weighting to alleviate long-tail relation bias.•Dual-entity attention capturing subject–object context for discriminative features.
ISSN:0952-1976
DOI:10.1016/j.engappai.2025.111984