Self-Supervised Learning of Person-Specific Facial Dynamics for Automatic Personality Recognition

This article aims to solve two important issues that frequently occur in existing automatic personality analysis systems: 1. Attempting to use very short video segments or even single frames, rather than long-term behaviour, to infer personality traits; 2. Lack of methods to encode person-specific f...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on affective computing Vol. 14; no. 1; pp. 178 - 195
Main Authors	Song, Siyang, Jaiswal, Shashank, Sanchez, Enrique, Tzimiropoulos, Georgios, Shen, Linlin, Valstar, Michel
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Automatic personality analysis CNN weights representation convolution neural networks Dynamics Face recognition Faces facial temporal evolution Machine learning Personality Personality traits personalized dynamic layer Predictive models Psychology Recognition Self-supervised learning Task analysis Training Video
Online Access	Get full text
ISSN	1949-3045 1949-3045
DOI	10.1109/TAFFC.2021.3064601

Cover

Loading…

More Information
Summary:	This article aims to solve two important issues that frequently occur in existing automatic personality analysis systems: 1. Attempting to use very short video segments or even single frames, rather than long-term behaviour, to infer personality traits; 2. Lack of methods to encode person-specific facial dynamics for personality recognition. To deal with these issues, this paper first proposes a novel Rank Loss which utilizes the natural temporal evolution of facial actions, rather than personality labels, for self-supervised learning of facial dynamics. Our approach first trains a generic U-net style model that can infer general facial dynamics learned from a set of unlabelled face videos. Then, the generic model is frozen, and a set of intermediate filters are incorporated into this architecture. The self-supervised learning is then resumed with only person-specific videos. This way, the learned filters' weights are person-specific, making them a valuable source for modeling person-specific facial dynamics. We then propose to concatenate the weights of the learned filters as a person-specific representation, which can be directly used to predict the personality traits without needing other parts of the network. We evaluate the proposed approach on both self-reported personality and apparent personality datasets. In addition to achieving promising results in the estimation of personality trait scores from videos, we show that the tasks conducted by the subject in the video matters, that fusion of a combination of tasks reaches highest accuracy, and that multi-scale dynamics are more informative than single-scale dynamics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1949-3045 1949-3045
DOI:	10.1109/TAFFC.2021.3064601