Union-Redefined Prototype Network for scene graph generation
Recent advances in scene graph generation employ commonsense knowledge to model visual prototypes for various predicates. However, indiscriminate application of prototypes to all entity pairs within a predicate can lead to confusion when interpreting the same entity pairs that share similar visual f...
Saved in:
Published in | Expert systems with applications Vol. 280; p. 127486 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
25.06.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent advances in scene graph generation employ commonsense knowledge to model visual prototypes for various predicates. However, indiscriminate application of prototypes to all entity pairs within a predicate can lead to confusion when interpreting the same entity pairs that share similar visual features but correspond to different predicates. Particularly, clearly distinguishing predicates often requires careful consideration of subtle factors, such as gaze direction, spatial distance, and surrounding context. While these aspects are inherently part of the non-overlapping regions between the subject and object, they are frequently overlooked in practice. In this paper, we propose the Union-Redefined Prototype Network (UP-Net), which effectively captures predicate-specific visual nuances by leveraging the subject and object’s positional information and non-overlapping areas. Our method redefines entity pairs by incorporating the often-overlooked union region, excluding the intersection between the subject and object. Furthermore, we enhance the discriminative power among predicates by increasing the divergence between predicate-specific representations of the same entity pairs, thereby capturing the subtle visual nuances associated with each predicate. Extensive experiments on the Visual Genome dataset demonstrate that our approach achieves state-of-the-art performance. |
---|---|
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2025.127486 |