A bidirectional cross-modal transformer representation learning model for EEG-fNIRS multimodal affective BCI

By recognizing or regulating human emotions, the affective brain–computer interfaces (BCIs) could improve human–computer interactions. However, human emotion involves complex temporal–spatial brain networks. Therefore, unimodal brain imaging methods have difficulty to decode complex human emotions....

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 266; p. 126081
Main Authors	Si, Xiaopeng, Zhang, Shuai, Yang, Zhuobin, Yu, Jiayue, Ming, Dong
Format	Journal Article
Language	English
Published	Elsevier Ltd 25.03.2025
Subjects	Affective brain–computer interface Bidirectional cross-modal transformer Electroencephalography Functional near-infrared spectroscopy Model interpretability Multimodal fusion Electroencephalography Model interpretability Bidirectional cross-modal transformer Functional near-infrared spectroscopy Multimodal fusion Affective brain–computer interface
Online Access	Get full text

Cover

Loading…

More Information
Summary:	By recognizing or regulating human emotions, the affective brain–computer interfaces (BCIs) could improve human–computer interactions. However, human emotion involves complex temporal–spatial brain networks. Therefore, unimodal brain imaging methods have difficulty to decode complex human emotions. Multimodal brain imaging methods, which capture temporal–spatial multi-dimensional brain signals, have been successfully employed in non-affective BCIs, showing extreme potential to improve the affective BCIs. In order to explore a multimodal fusion model with interpretability and improve emotion recognition performance for multimodal affective BCIs. In this study, we propose a Temporal–Spatial Multimodal Fusion (TSMMF) model, which leverages the bidirectional Cross-Modal Transformer (BCMT) to fuse electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) multimodal brain signals. Firstly, intra-modal feature extractors and the Self-Attention Transformer were employed to construct joint EEG-fNIRS multimodal representations, reducing inter-modal differences. Secondly, the BCMT was adopted to achieve temporal–spatial multimodal fusion, followed by attention fusion to adaptively adjust the weights of the temporal–spatial multimodal features. Thirdly, modality-specific branches were introduced to preserve the unique features of each modality, then the outputs of all branches were weighted sum for emotion recognition. Furthermore, the model learned the weights of emotion-related brain regions for different modalities. Results showed that: (1) We proposed the first affective BCI based on multimodal brain imaging methods and the emotion recognition outperformed the state-of-the-art methods. (2) An accuracy of 76.15% was achieved for cross-subject emotion decoding, representing improvements of 6.06% and 12.44% compared to EEG and fNIRS unimodal approaches, respectively. (3) The spatial interpretability indicated that: compared to modality-specific branches focusing on common brain regions, whereas the multimodal fusion branch emphasizes differential brain regions related to different emotions. Collectively, our method, inspired by neuroscience, could enhance the development of BCI and multimodal brain signals decoding. Our code is available at: https://github.com/ThreePoundUniverse/TSMMF-ESWA/. •The EEG-fNIRS affective brain–computer interface was firstly proposed based on multimodal brain imaging methods.•The state-of-the-art performance was achieved for cross-subject emotion recognition on multimodal brain imaging dataset.•Bidirectional Cross-Modal Transformer data fusion framework was firstly proposed for multimodal brain signals.•Model interpretability revealed the emotion-related brain spatial information.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2024.126081