Scalable Semi-Automatic Annotation for Multi-Camera Person Tracking

This paper proposes a generic methodology for semi-automatic generation of reliable position annotations for evaluating multi-camera people-trackers on large video datasets. Most of the annotation data is computed automatically, by estimating a consensus tracking result from multiple existing tracke...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 25; no. 5; p. 2259
Main Authors Nino, Jorge, Frias-Velazquez, Andres, Bo Bo, Nyan, Slembrouck, Maarten, Guan, Junzhi, Debard, Glen, Vanrumste, Bart, Tuytelaars, Tinne, Philips, Wilfried
Format Journal Article
LanguageEnglish
Published United States 01.05.2016
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper proposes a generic methodology for semi-automatic generation of reliable position annotations for evaluating multi-camera people-trackers on large video datasets. Most of the annotation data is computed automatically, by estimating a consensus tracking result from multiple existing trackers and people detectors and classifying it as either reliable or not. A small subset of the data, composed of tracks with insufficient reliability is verified by a human using a simple binary decision task, a process faster than marking the correct person position. The proposed framework is generic and can handle additional trackers. We present results on a dataset of approximately 6 hours captured by 4 cameras, featuring a person in a holiday flat, performing activities such as walking, cooking, eating, cleaning, and watching TV. When aiming for a tracking accuracy of 60cm, 80% of all video frames are automatically annotated. The annotations for the remaining 20% of the frames were added after human verification of an automatically selected subset of data. This involved about 2.4 hours of manual labour. According to a subsequent comprehensive visual inspection to judge the annotation procedure, we found 99% of the automatically annotated frames to be correct. We provide guidelines on how to apply the proposed methodology to new datasets. We also provide an exploratory study for the multi-target case, applied on existing and new benchmark video sequences.
ISSN:1941-0042