Modelling relations with prototypes for visual relation detection
Relations between objects drive our understanding of images. Modelling them poses several challenges due to the combinatorial nature of the problem and the complex structure of natural language. This paper tackles the task of predicting relationships in the form of (subject, relation, object) triple...
Saved in:
Published in | Multimedia tools and applications Vol. 80; no. 15; pp. 22465 - 22486 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.06.2021
Springer Nature B.V Springer Verlag |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Relations between objects drive our understanding of images. Modelling them poses several challenges due to the combinatorial nature of the problem and the complex structure of natural language. This paper tackles the task of predicting relationships in the form of (subject, relation, object) triplets from still images. To address these issues, we propose a framework for learning relation prototypes that aims to capture the complex nature of relation distributions. Concurrently, a network is trained to define a space in which relationship triplets with similar spatial layouts, interacting objects and relations are clustered together. Finally, the network is compared to two models explicitly tackling the problem of synonymy among relations. For this, two well known scene-graph labelling benchmarks are used for training and testing: VRD and Visual Genome. Prediction of relations based on distance to prototype provides a significant increase in the diversity of predicted relations, improving the average relation recall from 40.3% to 41.7% on the first and 31.3% to 35.4% on the second. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-020-09001-6 |