Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study

ObjectivesTo compare the prediction effects of six models based on machine learning theories, which can provide a methodological reference for predicting the risk of type 2 diabetes mellitus (T2DM).Setting and participantsThis study was based on the monitoring data of chronic disease risk factors in...

Full description

Saved in:

Bibliographic Details
Published in	BMJ open Vol. 13; no. 8; p. e069018
Main Authors	Wang, Shu, Chen, Rong, Wang, Shuang, Kong, Danli, Cao, Rudai, Lin, Chunwen, Luo, Ling, Huang, Jialu, Zhang, Qiaoli, Yu, Haibing, Ding, Yuan Lin
Format	Journal Article
Language	English
Published	London British Medical Journal Publishing Group 29.08.2023 BMJ Publishing Group LTD BMJ Publishing Group
Series	Original research
Subjects	Accuracy Artificial intelligence Blood pressure Cholesterol Chronic illnesses Cross-sectional studies Decision trees Diabetes Disease EPIDEMIOLOGY General diabetes Glucose Machine learning Neural networks Patients Public Health Regression analysis Risk factors Sample size STATISTICS & RESEARCH METHODS Support vector machines Variables
Online Access	Get full text

Cover

Loading…

More Information
Summary:	ObjectivesTo compare the prediction effects of six models based on machine learning theories, which can provide a methodological reference for predicting the risk of type 2 diabetes mellitus (T2DM).Setting and participantsThis study was based on the monitoring data of chronic disease risk factors in Dongguan residents from 2016 to 2018. The multistage cluster random sampling method was adopted at each monitoring site, and 4157 people were finally selected. In the initial population, we excluded individuals with more than 20% missing data and eventually included 4106 subjects.DesignK nearest neighbour algorithm and synthetic minority oversampling technique were used to process the data. Single factor analysis was used for preliminary selection of variables. The 10-fold cross-validation was used to optimise the parameters of some models. The accuracy, precision, recall and area under receiver operating characteristic curve (AUC) were used to evaluate the prediction effect of models, and Delong test was used to analyse the differences of AUC values of each model.ResultsAfter balancing data, the sample size increased to 8013, of which 4023 are patients with T2DM and 3990 in control group. The comparison results of the six models showed that back propagation neural network model has the best prediction effect with 93.7% accuracy, 94.6% accuracy, 92.8% recall and the AUC value of 0.977, followed by logistic model, support vector machine model, CART decision tree model and C4.5 decision tree model. Deep neural network has the worst prediction performance, with 84.5% accuracy, 86.1% precision, 82.9% recall and the AUC value of 0.845.ConclusionsIn this study, six types of risk prediction models for T2DM were constructed, and the predictive effects of these models were compared based on various indicators. The results showed that back propagation neural network based on the selected data set had the best prediction effect.
Bibliography:	Original research ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2044-6055 2044-6055
DOI:	10.1136/bmjopen-2022-069018