Adaptive information fusion network for multi‐modal personality recognition

Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper intro...

Full description

Saved in:
Bibliographic Details
Published inComputer animation and virtual worlds Vol. 35; no. 3
Main Authors Bao, Yongtang, Liu, Xiang, Qi, Yue, Liu, Ruijun, Li, Haojie
Format Journal Article
LanguageEnglish
Published Chichester Wiley Subscription Services, Inc 01.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper introduces an adaptive multi‐modal information fusion network (AMIF‐Net) capable of concurrently processing video, audio, and text data. First, utilizing the AMIF‐Net encoder, we process the extracted audio and video features separately, effectively capturing long‐term data relationships. Then, adding adaptive elements in the fusion network can alleviate the problem of heterogeneity between modes. Lastly, we concatenate audio‐video and text features into a regression network to obtain Big Five personality trait scores. Furthermore, we introduce a novel loss function to address the problem of training inaccuracies, taking advantage of its unique property of exhibiting a peak at the critical mean. Our tests on the ChaLearn First Impressions V2 multi‐modal dataset show partial performance surpassing state‐of‐the‐art networks. This paper proposes an adaptive multimodal information fusion network for personality recognition. The design features of each encoder are optimized and merged for downstream tasks. We greatly enhance the functionality of the Transformer component by integrating adaptive attention and automatic learning of cross‐modal associations. This not only solves the problem of outliers and gradient vanishing during model training, but also has practical significance for practical applications.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1546-4261
1546-427X
DOI:10.1002/cav.2268