Toward a Comprehensive Face Detector in the Wild

In this paper, we aim to build a comprehensive face detection system which provides a one-stop solution to various practical challenges for face detection in realistic scenarios, e.g., detecting faces from multiple-views, faces with occlusions, exaggerated expressions or blurred faces. Moreover, we...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 29; no. 1; pp. 104 - 114
Main Authors Li, Jianshu, Liu, Luoqi, Li, Jianan, Feng, Jiashi, Yan, Shuicheng, Sim, Terence
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we aim to build a comprehensive face detection system which provides a one-stop solution to various practical challenges for face detection in realistic scenarios, e.g., detecting faces from multiple-views, faces with occlusions, exaggerated expressions or blurred faces. Moreover, we introduce an automatic data harvest algorithm to effectively improve the generalization performance of the system even when collecting training faces containing various challenging patterns is difficult. In particular, we introduce three critical components to build the system, i.e., a recently widely used deep convolutional neural network (CNN), a novel blur-aware bi-channel network architecture, and a new self-learning mechanism capable of exploiting video contexts continuously. The aforementioned challenges except for detecting blurred faces can potentially be addressed by the CNN component owing its robustness to local deformation of target faces. The more challenging problem of detecting blurred faces is addressed by the bi-channel architecture component which processes blurred and clear faces adaptively. In addition, to address the difficulties in improving the generalization performance of the learning-based face detection system, we introduce a video-context-based self-learning mechanism into the system, which enables the system to continuously enhance its performance by harvesting faces with challenging training patterns automatically. To exploit video context, the detector is applied to massive unlabeled videos, and challenging faces are captured based on temporal inference. These recaptured faces, generally corresponding to one or multiple challenges mentioned above, are fed into the detection system to further improve its performance. Extensive experiments with the proposed detection system provide new state-of-the-art performance on FDDB data set, PASCAL face data set, AFW data set, and WIDER Face data set.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2017.2778227