Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition
We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object). We observe that given a pair of bounding box proposals, objects often participate in multiple relations implying th...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
22.11.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We address the problem of Visual Relationship Detection (VRD) which aims to
describe the relationships between pairs of objects in the form of triplets of
(subject, predicate, object). We observe that given a pair of bounding box
proposals, objects often participate in multiple relations implying the
distribution of triplets is multimodal. We leverage the strong correlations
within triplets to learn the joint distribution of triplet variables
conditioned on the image and the bounding box proposals, doing away with the
hitherto used independent distribution of triplets. To make learning the
triplet joint distribution feasible, we introduce a novel technique of learning
conditional triplet distributions in the form of their normalized low rank
non-negative tensor decompositions. Normalized tensor decompositions take form
of mixture distributions of discrete variables and thus are able to capture
multimodality. This allows us to efficiently learn higher order discrete
multimodal distributions and at the same time keep the parameter size
manageable. We further model the probability of selecting an object proposal
pair and include a relation triplet prior in our model. We show that each part
of the model improves performance and the combination outperforms
state-of-the-art score on the Visual Genome (VG) and Visual Relationship
Detection (VRD) datasets. |
---|---|
DOI: | 10.48550/arxiv.1911.09895 |