Multi-Modal Cross-Subject Emotion Feature Alignment and Recognition with EEG and Eye Movements

Multi-modal emotion recognition has attracted much attention in human-computer interaction, because it provides complementary information for the recognition model. However, the distribution drift among subjects and the heterogeneity of different modalities pose challenges to multi-modal emotion rec...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on affective computing Vol. 16; no. 3; pp. 1 - 15
Main Authors Zhu, Qi, Zhu, Ting, Fei, Lunke, Zheng, Chuhang, Shao, Wei, Zhang, David, Zhang, Daoqiang
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1949-3045
1949-3045
DOI10.1109/TAFFC.2025.3554399

Cover

Loading…
More Information
Summary:Multi-modal emotion recognition has attracted much attention in human-computer interaction, because it provides complementary information for the recognition model. However, the distribution drift among subjects and the heterogeneity of different modalities pose challenges to multi-modal emotion recognition, thereby limiting its practical application. Most of the current multi-modal emotion recognition methods are difficult to suppress above uncertainties in fusion. In this paper, we propose a cross-subject multi-modal emotion recognition framework, which jointly learns subject-independent representation and common feature between EEG and eye movements. Firstly, we design the dynamic adversarial domain adaptation for cross-subject distribution alignment, dynamically selecting source domains in training. Secondly, we simultaneously capture intra-modal and inter-modal emotion-related features by both self-attention and cross-attention mechanisms, thus obtaining the robust and complementary representation of emotional information. Then, two contrastive loss functions are imposed on above network to further reduce inter-modal heterogeneity, and mine higher-order semantic similarity between synchronously collected multi-modal data. Finally, we used the output of the softmax layer as the predicted value. The experimental results on several multi-modal emotion datasets with EEG and eye movements demonstrate that our method is significantly superior to the state-of-the-art emotion recognition approaches. Our code is available at: https://github.com/xbrainnet/CSMM .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1949-3045
1949-3045
DOI:10.1109/TAFFC.2025.3554399