Union-Redefined Prototype Network for scene graph generation

Recent advances in scene graph generation employ commonsense knowledge to model visual prototypes for various predicates. However, indiscriminate application of prototypes to all entity pairs within a predicate can lead to confusion when interpreting the same entity pairs that share similar visual f...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 280; p. 127486
Main Authors Jung, NamGyu, Choi, Chang
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 25.06.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recent advances in scene graph generation employ commonsense knowledge to model visual prototypes for various predicates. However, indiscriminate application of prototypes to all entity pairs within a predicate can lead to confusion when interpreting the same entity pairs that share similar visual features but correspond to different predicates. Particularly, clearly distinguishing predicates often requires careful consideration of subtle factors, such as gaze direction, spatial distance, and surrounding context. While these aspects are inherently part of the non-overlapping regions between the subject and object, they are frequently overlooked in practice. In this paper, we propose the Union-Redefined Prototype Network (UP-Net), which effectively captures predicate-specific visual nuances by leveraging the subject and object’s positional information and non-overlapping areas. Our method redefines entity pairs by incorporating the often-overlooked union region, excluding the intersection between the subject and object. Furthermore, we enhance the discriminative power among predicates by increasing the divergence between predicate-specific representations of the same entity pairs, thereby capturing the subtle visual nuances associated with each predicate. Extensive experiments on the Visual Genome dataset demonstrate that our approach achieves state-of-the-art performance.
ISSN:0957-4174
DOI:10.1016/j.eswa.2025.127486