IPGN: Interactiveness Proposal Graph Network for Human-Object Interaction Detection

Human-Object Interaction (HOI) Detection is an important task to understand how humans interact with objects. Most of the existing works treat this task as an exhaustive triplet <inline-formula> <tex-math notation="LaTeX">\left \langle{ human, verb, object }\right \rangle </...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 30; pp. 6583 - 6593
Main Authors Wang, Haoran, Jiao, Licheng, Liu, Fang, Li, Lingling, Liu, Xu, Ji, Deyi, Gan, Weihao
Format Journal Article
LanguageEnglish
Published New York IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Human-Object Interaction (HOI) Detection is an important task to understand how humans interact with objects. Most of the existing works treat this task as an exhaustive triplet <inline-formula> <tex-math notation="LaTeX">\left \langle{ human, verb, object }\right \rangle </tex-math></inline-formula> classification problem. In this paper, we decompose it and propose a novel two-stage graph model to learn the knowledge of interactiveness and interaction in one network, namely, Interactiveness Proposal Graph Network (IPGN). In the first stage, we design a fully connected graph for learning the interactiveness, which distinguishes whether a pair of human and object is interactive or not. Concretely, it generates the interactiveness features to encode high-level semantic interactiveness knowledge for each pair. The class-agnostic interactiveness is a more general and simpler objective, which can be used to provide reasonable proposals for the graph construction in the second stage. In the second stage, a sparsely connected graph is constructed with all interactive pairs selected by the first stage. Specifically, we use the interactiveness knowledge to guide the message passing. By contrast with the feature similarity, it explicitly represents the connections between the nodes. Benefiting from the valid graph reasoning, the node features are well encoded for interaction learning. Experiments show that the proposed method achieves state-of-the-art performance on both V-COCO and HICO-DET datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2021.3096333