Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals

Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limi...

Full description

Saved in:

Bibliographic Details
Published in	Journal of translational medicine Vol. 23; no. 1; pp. 871 - 23
Main Authors	Li, Qi, Cao, Wei, Zhang, Anyuan
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 06.08.2025 BioMed Central BMC
Subjects	Algorithms Analysis Convolutional neural network (CNN) Deep Learning Detectors Electric transformers Electroencephalography Electroencephalography (EEG) signals Electroencephalography - methods Epilepsy Epilepsy - diagnosis Epilepsy - physiopathology Epileptic seizure detection Humans Multi-stream feature fusion (MSFF) Neural networks Neural Networks, Computer Seizures (Medicine) Seizures - diagnosis Seizures - physiopathology Signal Processing, Computer-Assisted Vision transformer (ViT) Wavelet Analysis Electroencephalography (EEG) signals Vision transformer (ViT) Convolutional neural network (CNN) Epileptic seizure detection Multi-stream feature fusion (MSFF)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limiting the performance enhancement of the models in seizure detection. Our study proposes an epilepsy detection model, CMFViT, based on a Multi-Stream Feature Fusion (MSFF) strategy that fuses a Convolutional Neural Network (CNN) with a Vision Transformer (ViT). The model converts EEG signals into time-frequency domain images using the Tunable Q-factor Wavelet Transform (TQWT), and then utilizes the CNN module and the ViT module to capture local features and global time-series correlations, respectively. It fuses different feature representations through the MSFF strategy to enhance its discriminative ability, and finally completes the classification task through the average pooling layer and the fully connected layer. The effectiveness of the model was validated by experimental evaluations on the publicly available CHB-MIT dataset and the Kaggle 121 people epilepsy dataset. The model achieved 98.85% classification accuracy and other excellent metrics in single-subject experiments on the CHB-MIT dataset, and also demonstrated strong performance in cross-subject experiments on the Kaggle dataset. Ablation experiments demonstrate the complementary roles of the CNN and ViT modules, and their integration significantly improves detection accuracy and generalization. Comparisons with other methods highlight the advantages of the CMFViT model. The CMFViT model provides an efficient, accurate, and innovative solution for complex EEG signal analysis and seizure detection tasks for single and cross-subjects while laying the foundation for developing real-time, accurate seizure detection systems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1479-5876 1479-5876
DOI:	10.1186/s12967-025-06862-z