Joint face and head tracking inside multi-camera smart rooms

The paper introduces a novel detection and tracking system that provides both frame-view and world-coordinate human location information, based on video from multiple synchronized and calibrated cameras with overlapping fields of view. The system is developed and evaluated for the specific scenario...

Full description

Saved in:
Bibliographic Details
Published inSignal, image and video processing Vol. 1; no. 2; pp. 163 - 178
Main Authors Zhang, Zhenqiu, Potamianos, Gerasimos, Senior, Andrew W., Huang, Thomas S.
Format Journal Article
LanguageEnglish
Published Heidelberg Springer Nature B.V 01.06.2007
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The paper introduces a novel detection and tracking system that provides both frame-view and world-coordinate human location information, based on video from multiple synchronized and calibrated cameras with overlapping fields of view. The system is developed and evaluated for the specific scenario of a seminar lecturer presenting in front of an audience inside a “smart room”, its aim being to track the lecturer’s head centroid in the three-dimensional (3D) space and also yield two-dimensional (2D) face information in the available camera views. The proposed approach is primarily based on a statistical appearance model of human faces by means of well-known AdaBoost-like face detectors, extended to address the head pose variation observed in the smart room scenario of interest. The appearance module is complemented by two novel components and assisted by a simple tracking drift detection mechanism. The first component of interest is the initialization module, which employs a spatio-temporal dynamic programming approach with appropriate penalty functions to obtain optimal 3D location hypotheses. The second is an adaptive subspace learning based 2D tracking scheme with a novel forgetting mechanism, introduced to reduce tracking drift and increase robustness. System performance is benchmarked on an extensive database of realistic human interaction in the lecture smart room scenario, collected as part of the European integrated project “CHIL”. The system consistently achieves excellent tracking precision, with a 3D mean tracking error of less than 16 cm, and is demonstrated to outperform four alternative tracking schemes. Furthermore, the proposed system performs relatively well in detecting frontal and near-frontal faces in the available frame views.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-007-0018-3