Distilling EEG representations via capsules for affective computing

•We distill EEG representations via capsule-based architectures.•We encourage lightweight model to mimic heavy model distillation using privileged information.•Our proposed framework performs well given the high compression rate and limited training samples.•Our framework achieves state-of-the-art r...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 171; pp. 99 - 105
Main Authors	Zhang, Guangyi, Etemad, Ali
Format	Journal Article
Language	English
Published	Elsevier B.V 01.07.2023
Subjects	Capsule network Deep learning Electroencephalography Model compression Deep learning Electroencephalography Capsule network Model compression
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•We distill EEG representations via capsule-based architectures.•We encourage lightweight model to mimic heavy model distillation using privileged information.•Our proposed framework performs well given the high compression rate and limited training samples.•Our framework achieves state-of-the-art results on two public large EEG datasets. Affective computing with Electroencephalogram (EEG) is a challenging task that requires cumbersome models to effectively learn the information contained in large-scale EEG signals, causing difficulties for real-time smart-device deployment. In this paper, we propose a novel knowledge distillation pipeline to distill EEG representations via capsule-based architectures for both classification and regression tasks. Our goal is to distill information from a heavy model to a lightweight model for subject-specific tasks. To this end, we first pre-train a large model (teacher network) on large number of training samples. Then, we employ the teacher network to learn the discriminative features embedded in capsules by adopting a lightweight model (student network) to mimic the teacher using the privileged knowledge. Such privileged information learned by the teacher contain similarities among capsules and are only available during the training stage of the student network. We evaluate the proposed architecture on two large-scale public EEG datasets, showing that our framework consistently enables student networks with different compression ratios to effectively learn from the teacher, even when provided with limited training samples. Lastly, our method achieves state-of-the-art results on one of the two datasets.
ISSN:	0167-8655
DOI:	10.1016/j.patrec.2023.05.011