Distribution-aware network with context and entity attention for scene graph generation
Scene Graph Generation (SGG) aims to detect objects and infer their pairwise relationships in images, forming a structured semantic graph. Despite recent advances, existing methods struggle with effectively capturing rich contextual dependencies and suffer from biased relation prediction due to long...
Saved in:
Published in | Engineering applications of artificial intelligence Vol. 160; p. 111984 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
27.11.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Scene Graph Generation (SGG) aims to detect objects and infer their pairwise relationships in images, forming a structured semantic graph. Despite recent advances, existing methods struggle with effectively capturing rich contextual dependencies and suffer from biased relation prediction due to long-tail data distributions. To address these challenges, we propose a novel SGG framework, named DANCE, which integrates three key modules: the Context-Augmented Message Passing (CAMP) module, the Distribution-aware Dynamic Weighted Loss (DDW-Loss), and the Dual-Entity Attention Enhancement (DEAE) mechanism. Specifically, the CAMP module leverages a Gated Recurrent Unit (GRU) based architecture to reason over the scene graph, capturing entity interactions through sequential context encoding. To mitigate noise from random graph connections, we incorporate multi-head attention into GRU and fuse its output with initial visual features via residual connections. This design enhances context propagation while maintaining stability. Furthermore, the DDW-Loss function dynamically adjusts the loss weights of relation categories based on their frequency distribution, thus improving the learning of semantically meaningful but infrequent relations. Finally, the DEAE module employs parallel multi-head attention over subject and object features, enabling the model to extract fine-grained semantic dependencies and generate more discriminative relational embeddings. Experimental results on three popular datasets demonstrate that our method significantly improves the performance of scene graph generation tasks and outperforms existing methods. In addition, the proposed method exhibits strong adaptability and robustness in real-world scenarios characterized by complex scenes, noisy inputs, and varying image resolutions, highlighting its superior generalization capability for practical deployment.
•GRU-based graph reasoning with multi-head attention and residuals for stability.•Distribution-aware dynamic loss weighting to alleviate long-tail relation bias.•Dual-entity attention capturing subject–object context for discriminative features. |
---|---|
ISSN: | 0952-1976 |
DOI: | 10.1016/j.engappai.2025.111984 |