Ischemic stroke prediction using machine learning in elderly Chinese population: The Rugao Longitudinal Ageing Study

Objective Compared logistic regression (LR) with machine learning (ML) models, to predict the risk of ischemic stroke in an elderly population in China. Methods We applied 2208 records from the Rugao Longitudinal Ageing Study (RLAS) for ischemic stroke risk prediction assessment. Input variables inc...

Full description

Saved in:
Bibliographic Details
Published inBrain and behavior Vol. 13; no. 12; pp. e3307 - n/a
Main Authors Chang, Huai‐Wen, Zhang, Hui, Shi, Guo‐Ping, Guo, Jiang‐Hong, Chu, Xue‐Feng, Wang, Zheng‐Dong, Yao, Yin, Wang, Xiao‐Feng
Format Journal Article
LanguageEnglish
Published United States John Wiley and Sons Inc 01.12.2023
Wiley
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Objective Compared logistic regression (LR) with machine learning (ML) models, to predict the risk of ischemic stroke in an elderly population in China. Methods We applied 2208 records from the Rugao Longitudinal Ageing Study (RLAS) for ischemic stroke risk prediction assessment. Input variables included 103 phenotypes. For 3‐year ischemic stroke risk prediction, we compared the discrimination and calibration of LR model and ML methods, where ML methods include Random Forest (RF), Gaussian kernel Support Vector Machines (SVM), Multilayer perceptron (MLP), K‐Nearest Neighbors Algorithm (KNN), and Gradient Boosting Decision Tree (GBDT) to develop an ischemic stroke risk prediction model. Results Age, pulse, waist circumference, education level, β2‐microglobulin, homocysteine, cystatin C, folate, free triiodothyronine, platelet distribution width, QT interval, and QTc interval were significant induced predictors of ischemic stroke. For ischemic stroke prediction, the ML approach was able to tap more biochemical and ECG‐related multidimensional phenotypic indicators compared to the LR model, which placed more importance on general demographic indicators. Compared to the LR model, SVM provided the best discrimination and calibration (C‐index: 0.79 vs. 0.71, 11.27% improvement in model utility), with the best performance in both validation and test data. Conclusion In a comparison of LR with five ML models, the accuracy of ischemic stroke prediction was higher by combining ML with multiple phenotypes. Combined with other studies based on elderly populations in China, ML techniques, especially SVM, have shown good long‐term predictive performance, inspiring the potential value of ML use in clinical practice. Gaussian kernel Support Vector Machines (SVM) is an effective ML strategy for ischemic stroke risk prediction in a large population with a multidimensional phenotypic dataset.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2162-3279
2162-3279
DOI:10.1002/brb3.3307