Appearance difference makes relationship: A new visual relationships inferance mechanism

To understand visual information better, the machine needs to go to a higher space based on the object recognition, which is to understand the relationship between objects. Despite recent advances in this field through deep learning technology, the detection and grounding for visual relationships is...

Full description

Saved in:
Bibliographic Details
Published inIEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC ...) (Online) Vol. 4; pp. 599 - 605
Main Authors Yu, Nie, Siyu, Zhu, Guiping, Su, Yuxin, Guo
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:To understand visual information better, the machine needs to go to a higher space based on the object recognition, which is to understand the relationship between objects. Despite recent advances in this field through deep learning technology, the detection and grounding for visual relationships is still a difficult task. In this work, we propose a relational attention model that considers appearance differences to try to get rid of the long tail distribution problem in data-driven methods. Appearance difference is used to highlight the difference between the entity in the image and the entity of the same category which determines the relationship. This is a customized structure based on a transformer, which considers the impact from subject, object and their Co-occurrence on the relationship. Our representation generation method based on multi-head attention can effectively model the relationship and solve the multi-label problem of the visual relationship. In comparison to other state of the art approaches, we achieve an absolute mean improvement in performance on the Visual Genome dataset.
ISSN:2693-2776
DOI:10.1109/IMCEC51613.2021.9482321