Self-Supervised Learning of Person-Specific Facial Dynamics for Automatic Personality Recognition

This article aims to solve two important issues that frequently occur in existing automatic personality analysis systems: 1. Attempting to use very short video segments or even single frames, rather than long-term behaviour, to infer personality traits; 2. Lack of methods to encode person-specific f...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on affective computing Vol. 14; no. 1; pp. 178 - 195
Main Authors Song, Siyang, Jaiswal, Shashank, Sanchez, Enrique, Tzimiropoulos, Georgios, Shen, Linlin, Valstar, Michel
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1949-3045
1949-3045
DOI10.1109/TAFFC.2021.3064601

Cover

Loading…
More Information
Summary:This article aims to solve two important issues that frequently occur in existing automatic personality analysis systems: 1. Attempting to use very short video segments or even single frames, rather than long-term behaviour, to infer personality traits; 2. Lack of methods to encode person-specific facial dynamics for personality recognition. To deal with these issues, this paper first proposes a novel Rank Loss which utilizes the natural temporal evolution of facial actions, rather than personality labels, for self-supervised learning of facial dynamics. Our approach first trains a generic U-net style model that can infer general facial dynamics learned from a set of unlabelled face videos. Then, the generic model is frozen, and a set of intermediate filters are incorporated into this architecture. The self-supervised learning is then resumed with only person-specific videos. This way, the learned filters' weights are person-specific, making them a valuable source for modeling person-specific facial dynamics. We then propose to concatenate the weights of the learned filters as a person-specific representation, which can be directly used to predict the personality traits without needing other parts of the network. We evaluate the proposed approach on both self-reported personality and apparent personality datasets. In addition to achieving promising results in the estimation of personality trait scores from videos, we show that the tasks conducted by the subject in the video matters, that fusion of a combination of tasks reaches highest accuracy, and that multi-scale dynamics are more informative than single-scale dynamics.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1949-3045
1949-3045
DOI:10.1109/TAFFC.2021.3064601