Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition

Affective brain-computer interfaces (aBCIs) have garnered widespread applications, with remarkable advancements in utilizing electroencephalogram (EEG) technology for emotion recognition. However, the time-consuming process of annotating EEG data, inherent individual differences, non-stationary char...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on neural systems and rehabilitation engineering Vol. 32; pp. 1637 - 1646
Main Authors	Pang, Miaoqi, Wang, Hongtao, Huang, Jiayang, Vong, Chi-Man, Zeng, Zhiqiang, Chen, Chuangquan
Format	Journal Article
Language	English
Published	United States IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adult Algorithms Brain modeling Brain-Computer Interfaces cross-session Data collection Data mining Data models EEG EEG-based emotion recognition Electroencephalography Electroencephalography - methods Emotion recognition Emotional factors Emotions Emotions - physiology Feature extraction Female Human-computer interface Humans Invariants Machine Learning Male Neural Networks, Computer Representations Robustness self-supervised learning Task analysis transformer
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Affective brain-computer interfaces (aBCIs) have garnered widespread applications, with remarkable advancements in utilizing electroencephalogram (EEG) technology for emotion recognition. However, the time-consuming process of annotating EEG data, inherent individual differences, non-stationary characteristics of EEG data, and noise artifacts in EEG data collection pose formidable challenges in developing subject-specific cross-session emotion recognition models. To simultaneously address these challenges, we propose a unified pre-training framework based on multi-scale masked autoencoders (MSMAE), which utilizes large-scale unlabeled EEG signals from multiple subjects and sessions to extract noise-robust, subject-invariant, and temporal-invariant features. We subsequently fine-tune the obtained generalized features with only a small amount of labeled data from a specific subject for personalization and enable cross-session emotion recognition. Our framework emphasizes: 1) multi-scale representation to capture diverse aspects of EEG signals, obtaining comprehensive information; 2) an improved masking mechanism for robust channel-level representation learning, addressing missing channel issues while preserving inter-channel relationships; and 3) invariance learning for regional correlations in spatial-level representation, minimizing inter-subject and inter-session variances. Under these elaborate designs, the proposed MSMAE exhibits a remarkable ability to decode emotional states from a different session of EEG data during the testing phase. Extensive experiments conducted on the two publicly available datasets, i.e., SEED and SEED-IV, demonstrate that the proposed MSMAE consistently achieves stable results and outperforms competitive baseline methods in cross-session emotion recognition.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1534-4320 1558-0210 1558-0210
DOI:	10.1109/TNSRE.2024.3389037