Integration of the Extreme Gradient Boosting model with electronic health records to enable the early diagnosis of multiple sclerosis
•The performance of five algorithms in early diagnosis of MS was compared.•Extreme Gradient Boosting (XGBoost) had a higher recall, specificity, and precision.•XGBoost showed the best performance in both training and test sets.•61%, 51%, and 49% of patients could be diagnosed with MS, 1, 2, and 3 ye...
Saved in:
Published in | Multiple sclerosis and related disorders Vol. 47; p. 102632 |
---|---|
Main Authors | , , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Netherlands
Elsevier B.V
01.01.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •The performance of five algorithms in early diagnosis of MS was compared.•Extreme Gradient Boosting (XGBoost) had a higher recall, specificity, and precision.•XGBoost showed the best performance in both training and test sets.•61%, 51%, and 49% of patients could be diagnosed with MS, 1, 2, and 3 years earlier.•Our model was effective to help reduce MS diagnostic delays.
Delayed multiple sclerosis (MS) diagnoses are not uncommon, an early diagnostic tool is urgently warranted. We aimed to develop an effective tool through electronic health records and machine learning techniques to early recognize MS patients from hospital visitors in China.
Two case sets were collected from January 2016 to December 2018. The training set had 239 MS and 1142 controls, and the test set had 23 MS and 92 controls. The utility of Extreme Gradient Boosting (XGBoost), Random Forest (RF), Naive Bayes, K-nearest-neighbor (KNN) and Support Vector Machine (SVM) in early diagnosis of MS was evaluated by the area under curve of receiver operating characteristic, precision, recall, specificity, accuracy and F1 score.
The XGBoost performed the best and was used to generate the results. Thirty-four variables which were highly relevant to MS diagnosis were set for the XGBoost model, and their relative importance with MS were ranked. The training set recall was 0.632, with a precision of 0.576, and the test set recall was 0.609, with a precision of 0.609. Our study found that 61%, 51%, and 49% of the patients could be diagnosed with MS, 1, 2, and 3 years earlier than their real diagnostic time point, respectively.
A diagnostic tool for early MS recognition based on the XGBoost model and electronic health records were developed to help reduce diagnostic delays in MS. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2211-0348 2211-0356 |
DOI: | 10.1016/j.msard.2020.102632 |