Integration of the Extreme Gradient Boosting model with electronic health records to enable the early diagnosis of multiple sclerosis

•The performance of five algorithms in early diagnosis of MS was compared.•Extreme Gradient Boosting (XGBoost) had a higher recall, specificity, and precision.•XGBoost showed the best performance in both training and test sets.•61%, 51%, and 49% of patients could be diagnosed with MS, 1, 2, and 3 ye...

Full description

Saved in:
Bibliographic Details
Published inMultiple sclerosis and related disorders Vol. 47; p. 102632
Main Authors Wang, Ruoning, Luo, Wenjing, Liu, Zifeng, Liu, Weilong, Liu, Chunxin, Liu, Xun, Zhu, He, Li, Rui, Song, Jiafang, Hu, Xueqiang, Han, Sheng, Qiu, Wei
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.01.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•The performance of five algorithms in early diagnosis of MS was compared.•Extreme Gradient Boosting (XGBoost) had a higher recall, specificity, and precision.•XGBoost showed the best performance in both training and test sets.•61%, 51%, and 49% of patients could be diagnosed with MS, 1, 2, and 3 years earlier.•Our model was effective to help reduce MS diagnostic delays. Delayed multiple sclerosis (MS) diagnoses are not uncommon, an early diagnostic tool is urgently warranted. We aimed to develop an effective tool through electronic health records and machine learning techniques to early recognize MS patients from hospital visitors in China. Two case sets were collected from January 2016 to December 2018. The training set had 239 MS and 1142 controls, and the test set had 23 MS and 92 controls. The utility of Extreme Gradient Boosting (XGBoost), Random Forest (RF), Naive Bayes, K-nearest-neighbor (KNN) and Support Vector Machine (SVM) in early diagnosis of MS was evaluated by the area under curve of receiver operating characteristic, precision, recall, specificity, accuracy and F1 score. The XGBoost performed the best and was used to generate the results. Thirty-four variables which were highly relevant to MS diagnosis were set for the XGBoost model, and their relative importance with MS were ranked. The training set recall was 0.632, with a precision of 0.576, and the test set recall was 0.609, with a precision of 0.609. Our study found that 61%, 51%, and 49% of the patients could be diagnosed with MS, 1, 2, and 3 years earlier than their real diagnostic time point, respectively. A diagnostic tool for early MS recognition based on the XGBoost model and electronic health records were developed to help reduce diagnostic delays in MS.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2211-0348
2211-0356
DOI:10.1016/j.msard.2020.102632