Visual-to-EEG cross-modal knowledge distillation for continuous emotion recognition

•By taking the above visual and EEG models as the teacher and student, we develop a cross-modal knowledge distillation method to improve the EEG-based continuous emotion recognition using visual knowledge.•The standalone version of teacher and student without knowledge distillation can outperform ba...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 130; p. 108833
Main Authors	Zhang, Su, Tang, Chuangao, Guan, Cuntai
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.10.2022
Subjects	Continuous emotion recognition Cross-modality Knowledge distillation Cross-modality Knowledge distillation Continuous emotion recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•By taking the above visual and EEG models as the teacher and student, we develop a cross-modal knowledge distillation method to improve the EEG-based continuous emotion recognition using visual knowledge.•The standalone version of teacher and student without knowledge distillation can outperform baseline.•The student model taught by the labels and the visual knowledge produces results with statistical significance against its counterpart without knowledge distillation.•To the best of the authors’ knowledge, this is the first work on visual-to-EEG cross-modal knowledge distillation for continuous emotion recognition.•The code is to be publicly available. Visual modality is one of the most dominant modalities for current continuous emotion recognition methods. Compared to which the EEG modality is relatively less sound due to its intrinsic limitation such as subject bias and low spatial resolution. This work attempts to improve the continuous prediction of the EEG modality by using the dark knowledge from the visual modality. The teacher model is built by a cascade convolutional neural network - temporal convolutional network (CNN-TCN) architecture, and the student model is built by TCNs. They are fed by video frames and EEG average band power features, respectively. Two data partitioning schemes are employed, i.e., the trial-level random shuffling (TRS) and the leave-one-subject-out (LOSO). The standalone teacher and student can produce continuous prediction superior to the baseline method, and the employment of the visual-to-EEG cross-modal KD further improves the prediction with statistical significance, i.e., p-value <0.01 for TRS and p-value <0.05 for LOSO partitioning. The saliency maps of the trained student model show that the brain areas associated with the active valence state are not located in precise brain areas. Instead, it results from synchronized activity among various brain areas. And the fast beta and gamma waves, with the frequency of 18−30Hz and 30−45Hz, contribute the most to the human emotion process compared to other bands. The code is available at https://github.com/sucv/Visual_to_EEG_Cross_Modal_KD_for_CER.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2022.108833