Development and Validation of a Machine Learning Algorithm for Predicting Diabetes Retinopathy in Patients With Type 2 Diabetes: Algorithm Development Study

Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. Machine learning (ML) systems can enhance DR in community-based screening. However, predictive power models for usability and performance are still being determined. This study used data from 3 university hospitals in...

Full description

Saved in:

Bibliographic Details
Published in	JMIR medical informatics Vol. 13; p. e58107
Main Authors	Kim, Sunyoung, Park, Jaeyu, Son, Yejun, Lee, Hojae, Woo, Selin, Lee, Myeongcheol, Lee, Hayeon, Sang, Hyunji, Yon, Dong Keon, Rhee, Sang Youl
Format	Journal Article
Language	English
Published	Canada JMIR Publications 07.02.2025
Subjects	Adult Aged Algorithms Diabetes Diabetes Mellitus, Type 2 - complications Diabetic Retinopathy Diabetic Retinopathy - diagnosis Diabetic Retinopathy - etiology Electronic Health Records Female Humans Machine Learning Male Middle Aged Original Paper Republic of Korea - epidemiology Risk Assessment - methods Risk Factors ROC Curve Tools, Programs and Algorithms Republic of Korea diabetes retinopathy retinal type 2 diabetes comorbidities prediction ophthalmology machine learning algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. Machine learning (ML) systems can enhance DR in community-based screening. However, predictive power models for usability and performance are still being determined. This study used data from 3 university hospitals in South Korea to conduct a simple and accurate assessment of ML-based risk prediction for the development of DR that can be universally applied to adults with type 2 diabetes mellitus (T2DM). DR was predicted using data from 2 independent electronic medical records: a discovery cohort (one hospital, n=14,694) and a validation cohort (2 hospitals, n=1856). The primary outcome was the presence of DR at 3 years. Different ML-based models were selected through hyperparameter tuning in the discovery cohort, and the area under the receiver operating characteristic (ROC) curve was analyzed in both cohorts. Among 14,694 patients screened for inclusion, 348 (2.37%) were diagnosed with DR. For DR, the extreme gradient boosting (XGBoost) system had an accuracy of 75.13% (95% CI 74.10-76.17), a sensitivity of 71.00% (95% CI 66.83-75.17), and a specificity of 75.23% (95% CI 74.16-76.31) in the original dataset. Among the validation datasets, XGBoost had an accuracy of 65.14%, a sensitivity of 64.96%, and a specificity of 65.15%. The most common feature in the XGBoost model is dyslipidemia, followed by cancer, hypertension, chronic kidney disease, neuropathy, and cardiovascular disease. This approach shows the potential to enhance patient outcomes by enabling timely interventions in patients with T2DM, improving our understanding of contributing factors, and reducing DR-related complications. The proposed prediction model is expected to be both competitive and cost-effective, particularly for primary care settings in South Korea.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Undefined-3 these authors contributed equally None declared.
ISSN:	2291-9694 2291-9694
DOI:	10.2196/58107