Robust Face Recognition via Multimodal Deep Face Representation

Face images appearing in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensiv...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 17; no. 11; pp. 2049 - 2058
Main Authors	Ding, Changxing, Tao, Dacheng
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.11.2015 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Convolutional neural networks (CNNs) deep learning Face Face recognition Feature extraction Learning Multimedia Multimedia communication multimodal system Neural networks Representations Social network services Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Face images appearing in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All of the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefitting from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2015.2477042