Appearance difference makes relationship: A new visual relationships inferance mechanism
To understand visual information better, the machine needs to go to a higher space based on the object recognition, which is to understand the relationship between objects. Despite recent advances in this field through deep learning technology, the detection and grounding for visual relationships is...
Saved in:
Published in | IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC ...) (Online) Vol. 4; pp. 599 - 605 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
18.06.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To understand visual information better, the machine needs to go to a higher space based on the object recognition, which is to understand the relationship between objects. Despite recent advances in this field through deep learning technology, the detection and grounding for visual relationships is still a difficult task. In this work, we propose a relational attention model that considers appearance differences to try to get rid of the long tail distribution problem in data-driven methods. Appearance difference is used to highlight the difference between the entity in the image and the entity of the same category which determines the relationship. This is a customized structure based on a transformer, which considers the impact from subject, object and their Co-occurrence on the relationship. Our representation generation method based on multi-head attention can effectively model the relationship and solve the multi-label problem of the visual relationship. In comparison to other state of the art approaches, we achieve an absolute mean improvement in performance on the Visual Genome dataset. |
---|---|
ISSN: | 2693-2776 |
DOI: | 10.1109/IMCEC51613.2021.9482321 |