RGB-D Face Recognition With Identity-Style Disentanglement and Depth Augmentation

Deep learning approaches achieve highly accurate face recognition by training the models with huge face image datasets. Unlike 2D face image datasets, there is a lack of large 3D face datasets available to the public. Existing public 3D face datasets were usually collected with few subjects, leading...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on biometrics, behavior, and identity science Vol. 5; no. 3; pp. 334 - 347
Main Authors	Chiu, Meng-Tzu, Cheng, Hsun-Ying, Wang, Chien-Yi, Lai, Shang-Hong
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.07.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	3D face recognition Datasets Depth estimation disentangled representation learning Estimation Face recognition face representation learning Image recognition Image segmentation Machine learning multi-modality face recognition RGB-D face recognition Semantic segmentation Solid modeling Task analysis Three-dimensional displays Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning approaches achieve highly accurate face recognition by training the models with huge face image datasets. Unlike 2D face image datasets, there is a lack of large 3D face datasets available to the public. Existing public 3D face datasets were usually collected with few subjects, leading to the over-fitting problem. This paper proposes two CNN models to improve the RGB-D face recognition task. The first is a segmentation-aware depth estimation network, called DepthNet, which estimates depth maps from RGB face images by exploiting semantic segmentation for more accurate face region localization. The other is a novel segmentation-guided RGB-D face recognition model that contains an RGB recognition branch, a depth map recognition branch, and an auxiliary segmentation mask branch. In our multi-modality face recognition model, a feature disentanglement scheme is employed to factorize the feature representation into identity-related and style-related components. DepthNet is applied to augment a large 2D face image dataset to a large RGB-D face dataset, which is used for training our RGB-D face recognition model. Our experimental results show that DepthNet can produce more reliable depth maps from face images with the segmentation mask. Our multi-modality face recognition model fully exploits the depth map and outperforms state-of-the-art methods on several public 3D face datasets with challenging variations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2637-6407 2637-6407
DOI:	10.1109/TBIOM.2022.3233769