Tell Me What They're Holding: Weakly-supervised Object Detection with Transferable Knowledge from Human-object Interaction
In this work, we introduce a novel weakly supervised object detection (WSOD) paradigm to detect objects belonging to rare classes that have not many examples using transferable knowledge from human-object interactions (HOI). While WSOD shows lower performance than full supervision, we mainly focus o...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
19.11.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this work, we introduce a novel weakly supervised object detection (WSOD)
paradigm to detect objects belonging to rare classes that have not many
examples using transferable knowledge from human-object interactions (HOI).
While WSOD shows lower performance than full supervision, we mainly focus on
HOI as the main context which can strongly supervise complex semantics in
images. Therefore, we propose a novel module called RRPN (relational region
proposal network) which outputs an object-localizing attention map only with
human poses and action verbs. In the source domain, we fully train an object
detector and the RRPN with full supervision of HOI. With transferred knowledge
about localization map from the trained RRPN, a new object detector can learn
unseen objects with weak verbal supervision of HOI without bounding box
annotations in the target domain. Because the RRPN is designed as an add-on
type, we can apply it not only to the object detection but also to other
domains such as semantic segmentation. The experimental results on HICO-DET
dataset show the possibility that the proposed method can be a cheap
alternative for the current supervised object detection paradigm. Moreover,
qualitative results demonstrate that our model can properly localize unseen
objects on HICO-DET and V-COCO datasets. |
---|---|
DOI: | 10.48550/arxiv.1911.08141 |