Joint low-rank tensor fusion and cross-modal attention for multimodal physiological signals based emotion recognition

Objective . Physiological signals based emotion recognition is a prominent research domain in the field of human-computer interaction. Previous studies predominantly focused on unimodal data, giving limited attention to the interplay among multiple modalities. Within the scope of multimodal emotion...

Full description

Saved in:

Bibliographic Details
Published in	Physiological measurement Vol. 45; no. 7; pp. 75003 - 75016
Main Authors	Wan, Xin, Wang, Yongxiong, Wang, Zhe, Tang, Yiheng, Liu, Benke
Format	Journal Article
Language	English
Published	England IOP Publishing 01.07.2024
Subjects	deep neural network emotion recognition multimodal fusion physiological signals Emotion recognition Deep neural network Physiological signals Multimodal fusion
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Objective . Physiological signals based emotion recognition is a prominent research domain in the field of human-computer interaction. Previous studies predominantly focused on unimodal data, giving limited attention to the interplay among multiple modalities. Within the scope of multimodal emotion recognition, integrating the information from diverse modalities and leveraging the complementary information are the two essential issues to obtain the robust representations. Approach . Thus, we propose a intermediate fusion strategy for combining low-rank tensor fusion with the cross-modal attention to enhance the fusion of electroencephalogram, electrooculogram, electromyography, and galvanic skin response. Firstly, handcrafted features from distinct modalities are individually fed to corresponding feature extractors to obtain latent features. Subsequently, low-rank tensor is fused to integrate the information by the modality interaction representation. Finally, a cross-modal attention module is employed to explore the potential relationships between the distinct latent features and modality interaction representation, and recalibrate the weights of different modalities. And the resultant representation is adopted for emotion recognition. Main results . Furthermore, to validate the effectiveness of the proposed method, we execute subject-independent experiments within the DEAP dataset. The proposed method has achieved the accuracies of 73.82% and 74.55% for valence and arousal classification. Significance . The results of extensive experiments verify the outstanding performance of the proposed method.
Bibliography:	PMEA-105650.R1 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0967-3334 1361-6579 1361-6579
DOI:	10.1088/1361-6579/ad5bbc