DisP+V: A Unified Framework for Disentangling Prototype and Variation From Single Sample per Person

Single sample per person face recognition (SSPP FR) is one of the most challenging problems in FR due to the extreme lack of enrolment data. To date, the most popular SSPP FR methods are the generic learning methods, which recognize query face images based on the so-called prototype plus variation (...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 34; no. 2; pp. 867 - 881
Main Authors	Pang, Meng, Wang, Binghui, Ye, Mang, Cheung, Yiu-ming, Chen, Yiran, Wen, Bihan
Format	Journal Article
Language	English
Published	United States IEEE 01.02.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adversarial learning Algorithms Coders Discriminators disentangled representation Face face editing Face recognition Faces Feature extraction Generators Humans Image manipulation Image reconstruction Interpolation Learning systems Neural Networks, Computer Pattern recognition Pattern Recognition, Automated - methods prototype recovery Prototypes Semantics single sample per person Variation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Single sample per person face recognition (SSPP FR) is one of the most challenging problems in FR due to the extreme lack of enrolment data. To date, the most popular SSPP FR methods are the generic learning methods, which recognize query face images based on the so-called prototype plus variation (i.e., P+V) model. However, the classic P+V model suffers from two major limitations: 1) it linearly combines the prototype and variation images in the observational pixel-spatial space and cannot generalize to multiple nonlinear variations, e.g., poses, which are common in face images and 2) it would be severely impaired once the enrolment face images are contaminated by nuisance variations. To address the two limitations, it is desirable to disentangle the prototype and variation in a latent feature space and to manipulate the images in a semantic manner. To this end, we propose a novel disentangled prototype plus variation model, dubbed DisP+V, which consists of an encoder-decoder generator and two discriminators. The generator and discriminators play two adversarial games such that the generator nonlinearly encodes the images into a latent semantic space, where the more discriminative prototype feature and the less discriminative variation feature are disentangled. Meanwhile, the prototype and variation features can guide the generator to generate an identity-preserved prototype and the corresponding variation, respectively. Experiments on various real-world face datasets demonstrate the superiority of our DisP+V model over the classic P+V model for SSPP FR. Furthermore, DisP+V demonstrates its unique characteristics in both prototype recovery and face editing/interpolation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2021.3103194