VrR-VG: Refocusing Visually-Relevant Relationships

Relationships encode the interactions among individual instances and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, relationship models tend to fit the statistical bias rather than ``learning" to infer the relationshi...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 10402 - 10411
Main Authors Liang, Yuanzhi, Bai, Yalong, Zhang, Wei, Qian, Xueming, Zhu, Li, Mei, Tao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Relationships encode the interactions among individual instances and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, relationship models tend to fit the statistical bias rather than ``learning" to infer the relationships from images. To encourage further development in visual relationships, we propose a novel method to mine more valuable relationships by automatically pruning visually-irrelevant relationships. We construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. Compared with existing datasets, the performance gap between learnable and statistical method is more significant in VrR-VG, and frequency-based analysis does not work anymore. Moreover, we propose to learn a relationship-aware representation by jointly considering instances, attributes and relationships. By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved, which demonstrates the effectiveness of both our dataset and features embedding schema. Both our VrR-VG dataset and representation-aware features will be made publicly available soon.
ISSN:2380-7504
DOI:10.1109/ICCV.2019.01050