HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage both consensus and disagreement among visual classifiers to automatically mine candidate short cli...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
26.12.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents a new large-scale dataset for recognition and temporal
localization of human actions collected from Web videos. We refer to it as HACS
(Human Action Clips and Segments). We leverage both consensus and disagreement
among visual classifiers to automatically mine candidate short clips from
unlabeled videos, which are subsequently validated by human annotators. The
resulting dataset is dubbed HACS Clips. Through a separate process we also
collect annotations defining action segment boundaries. This resulting dataset
is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips
sampled from 504K untrimmed videos, and HACS Seg-ments contains 139K action
segments densely annotatedin 50K untrimmed videos spanning 200 action
categories. HACS Clips contains more labeled examples than any existing video
benchmark. This renders our dataset both a large scale action recognition
benchmark and an excellent source for spatiotemporal feature learning. In our
transferlearning experiments on three target datasets, HACS Clips outperforms
Kinetics-600, Moments-In-Time and Sports1Mas a pretraining source. On HACS
Segments, we evaluate state-of-the-art methods of action proposal generation
and action localization, and highlight the new challenges posed by our dense
temporal annotations. |
---|---|
DOI: | 10.48550/arxiv.1712.09374 |