Scalable Semi-Automatic Annotation for Multi-Camera Person Tracking

This paper proposes a generic methodology for the semi-automatic generation of reliable position annotations for evaluating multi-camera people-trackers on large video data sets. Most of the annotation data are automatically computed, by estimating a consensus tracking result from multiple existing...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 25; no. 5; pp. 2259 - 2274
Main Authors	Nino-Castañeda, Jorge, Frias-Velazquez, Andres, Nyan Bo Bo, Slembrouck, Maarten, Junzhi Guan, Debard, Glen, Vanrumste, Bart, Tuytelaars, Tinne, Philips, Wilfried
Format	Journal Article
Language	English
Published	United States IEEE 01.05.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Annotations Cameras Cooking Data Curation - methods Detectors Estimating Frames Human Activities - classification Humans Image Processing, Computer-Assisted - methods Manuals Methodology multi-camera tracking people tracking performance evaluation Reliability semi-automatic annotation Surveillance Target tracking Tracking (position) Video data Video Recording - methods Visualization performance evaluation semi-automatic annotation people tracking Multi-camera tracking
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper proposes a generic methodology for the semi-automatic generation of reliable position annotations for evaluating multi-camera people-trackers on large video data sets. Most of the annotation data are automatically computed, by estimating a consensus tracking result from multiple existing trackers and people detectors and classifying it as either reliable or not. A small subset of the data, composed of tracks with insufficient reliability, is verified by a human using a simple binary decision task, a process faster than marking the correct person position. The proposed framework is generic and can handle additional trackers. We present results on a data set of ~6 h captured by 4 cameras, featuring a person in a holiday flat, performing activities such as walking, cooking, eating, cleaning, and watching TV. When aiming for a tracking accuracy of 60 cm, 80% of all video frames are automatically annotated. The annotations for the remaining 20% of the frames were added after human verification of an automatically selected subset of data. This involved ~2.4 h of manual labor. According to a subsequent comprehensive visual inspection to judge the annotation procedure, we found 99% of the automatically annotated frames to be correct. We provide guidelines on how to apply the proposed methodology to new data sets. We also provide an exploratory study for the multi-target case, applied on the existing and new benchmark video sequences.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2016.2542021