Toward a Comprehensive Face Detector in the Wild
In this paper, we aim to build a comprehensive face detection system which provides a one-stop solution to various practical challenges for face detection in realistic scenarios, e.g., detecting faces from multiple-views, faces with occlusions, exaggerated expressions or blurred faces. Moreover, we...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 29; no. 1; pp. 104 - 114 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we aim to build a comprehensive face detection system which provides a one-stop solution to various practical challenges for face detection in realistic scenarios, e.g., detecting faces from multiple-views, faces with occlusions, exaggerated expressions or blurred faces. Moreover, we introduce an automatic data harvest algorithm to effectively improve the generalization performance of the system even when collecting training faces containing various challenging patterns is difficult. In particular, we introduce three critical components to build the system, i.e., a recently widely used deep convolutional neural network (CNN), a novel blur-aware bi-channel network architecture, and a new self-learning mechanism capable of exploiting video contexts continuously. The aforementioned challenges except for detecting blurred faces can potentially be addressed by the CNN component owing its robustness to local deformation of target faces. The more challenging problem of detecting blurred faces is addressed by the bi-channel architecture component which processes blurred and clear faces adaptively. In addition, to address the difficulties in improving the generalization performance of the learning-based face detection system, we introduce a video-context-based self-learning mechanism into the system, which enables the system to continuously enhance its performance by harvesting faces with challenging training patterns automatically. To exploit video context, the detector is applied to massive unlabeled videos, and challenging faces are captured based on temporal inference. These recaptured faces, generally corresponding to one or multiple challenges mentioned above, are fed into the detection system to further improve its performance. Extensive experiments with the proposed detection system provide new state-of-the-art performance on FDDB data set, PASCAL face data set, AFW data set, and WIDER Face data set. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2017.2778227 |