Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals

Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limi...

Full description

Saved in:
Bibliographic Details
Published inJournal of translational medicine Vol. 23; no. 1; pp. 871 - 23
Main Authors Li, Qi, Cao, Wei, Zhang, Anyuan
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 06.08.2025
BioMed Central
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limiting the performance enhancement of the models in seizure detection. Our study proposes an epilepsy detection model, CMFViT, based on a Multi-Stream Feature Fusion (MSFF) strategy that fuses a Convolutional Neural Network (CNN) with a Vision Transformer (ViT). The model converts EEG signals into time-frequency domain images using the Tunable Q-factor Wavelet Transform (TQWT), and then utilizes the CNN module and the ViT module to capture local features and global time-series correlations, respectively. It fuses different feature representations through the MSFF strategy to enhance its discriminative ability, and finally completes the classification task through the average pooling layer and the fully connected layer. The effectiveness of the model was validated by experimental evaluations on the publicly available CHB-MIT dataset and the Kaggle 121 people epilepsy dataset. The model achieved 98.85% classification accuracy and other excellent metrics in single-subject experiments on the CHB-MIT dataset, and also demonstrated strong performance in cross-subject experiments on the Kaggle dataset. Ablation experiments demonstrate the complementary roles of the CNN and ViT modules, and their integration significantly improves detection accuracy and generalization. Comparisons with other methods highlight the advantages of the CMFViT model. The CMFViT model provides an efficient, accurate, and innovative solution for complex EEG signal analysis and seizure detection tasks for single and cross-subjects while laying the foundation for developing real-time, accurate seizure detection systems.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1479-5876
1479-5876
DOI:10.1186/s12967-025-06862-z