Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting
Since collecting and annotating data for spatio-temporal action detection is very expensive, there is a need to learn approaches with less supervision. Weakly supervised approaches do not require any bounding box annotations and can be trained only from labels that indicate whether an action occurs...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
21.01.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Since collecting and annotating data for spatio-temporal action detection is
very expensive, there is a need to learn approaches with less supervision.
Weakly supervised approaches do not require any bounding box annotations and
can be trained only from labels that indicate whether an action occurs in a
video clip. Current approaches, however, cannot handle the case when there are
multiple persons in a video that perform multiple actions at the same time. In
this work, we address this very challenging task for the first time. We propose
a baseline based on multi-instance and multi-label learning. Furthermore, we
propose a novel approach that uses sets of actions as representation instead of
modeling individual action classes. Since computing, the probabilities for the
full power set becomes intractable as the number of action classes increases,
we assign an action set to each detected person under the constraint that the
assignment is consistent with the annotation of the video clip. We evaluate the
proposed approach on the challenging AVA dataset where the proposed approach
outperforms the MIML baseline and is competitive to fully supervised
approaches. |
---|---|
DOI: | 10.48550/arxiv.2101.08567 |