Early stroke diagnosis and evaluation based on pathological voice classification using speech enhancement

Stroke usually occurs suddenly. Stroke prehospital screening tools heavily rely on medical knowledge and are subjective. As an essential aspect of stroke assessment, speech analysis provides a non-invasive and convenient approach to early stroke diagnosis (ESD), offering critical support for timely...

Full description

Saved in:

Bibliographic Details
Published in	Computers in biology and medicine Vol. 196; no. Pt C; p. 110940
Main Authors	Zhang, Jun, Qiu, Yiyi, Liu, Yingchen, Xiao, Yi, Yang, Jiayue, Yang, Xi, Ma, Ming, Song, Aiguo
Format	Journal Article
Language	English
Published	United States Elsevier Ltd 01.09.2025
Subjects	Deep learning Early stroke diagnosis Machine learning Pathological voice recognition Speech enhancement Transfer learning Deep learning Speech enhancement Pathological voice recognition Early stroke diagnosis Transfer learning Machine learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Stroke usually occurs suddenly. Stroke prehospital screening tools heavily rely on medical knowledge and are subjective. As an essential aspect of stroke assessment, speech analysis provides a non-invasive and convenient approach to early stroke diagnosis (ESD), offering critical support for timely intervention and treatment. Nevertheless, real-world speech recordings are often affected by environmental noise, which can significantly reduce the accuracy and reliability of speech-based diagnostic systems. This paper aims to investigate the feasibility and effectiveness of ESD based on pathological voice classification and speech enhancement (SE). We propose a cascaded ESD framework consisting of a SE module and a recognition module. Stroke patients’ sustained vowels (SVs) and spontaneous speech (SS) signals are denoised by the SEWUNet-based SE model. The recognition module subsequently diagnoses stroke from the enhanced speech. For SVs, discrete handcrafted features were extracted, and due to their strong interpretability and computational efficiency in pathological voice tasks, five classical machine learning algorithms (KNN, SVM, RF, DT, and AdaBoost) were employed to train both six-vowel and all-vowel recognition models. For SS, we used four data augmentation techniques to expand the dataset. Then, we extracted Mel-spectrogram features to train a CNN-Transformer model. Additionally, transfer learning was introduced by replacing the CNN with a pre-trained ResNet model to further improve performance. We trained all recognition models using five-fold cross-validation, with gender and age incorporated as physiological features. Based on a single-channel SEWUNet network, the SE module included separate enhancement models for SVs and SS. An early stopping mechanism was adopted during the training of the SS and enhancement models to prevent overfitting. Results showed that the optimal models for SVs achieved high accuracy, sensitivity, specificity, and F1-score, all exceeding 90 %. The two best models for SS surpassed 95 %. The SEWUNet-based enhancement model improved speech quality metrics for both SVs and SS. Moreover, the recognition models trained on enhanced speech achieved approximately a 10 % performance improvement compared to those trained on noisy speech. Ultimately, we designed a real-time ESD system using the AdaBoost and the CNN-Transformer models and conducted clinical trials and WeChat mini-program tests. Results demonstrated that among 34 subjects (24 patients and 10 healthy individuals), SVs achieved 85.29 % accuracy, with four patients misclassified and one patient undetermined. By applying the proposed two-stage recognition strategy (SVs followed by SS), the system achieved 100 % overall recognition accuracy. The proposed ESD method combining SE with SVs and SS can serve as an assistive diagnostic tool to help medical professionals and individuals detect and prevent strokes at an earlier stage, reduce workload, and improve identification objectivity. The code and experimental protocol of this paper are available at https://github.com/LiuYingchenseu/ESPVC. •Developed machine learning-based models for sustained vowel recognition.•Proposed a CNN-Transformer hybrid model for spontaneous speech analysis.•Trained a SEWUNet-based model for speech enhancement to improve recognition performance.•Proposed a two-stage recognition strategy that improved recognition accuracy and robustness.•Designed an assisted diagnosis system with a WeChat mini-program for convenient clinical testing and early stroke diagnosis.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0010-4825 1879-0534 1879-0534
DOI:	10.1016/j.compbiomed.2025.110940