Comparison of multiple linear regression and machine learning methods in predicting cognitive function in older Chinese type 2 diabetes patients

The prevalence of type 2 diabetes (T2D) has increased dramatically in recent decades, and there are increasing indications that dementia is related to T2D. Previous attempts to analyze such relationships principally relied on traditional multiple linear regression (MLR). However, recently developed...

Full description

Saved in:
Bibliographic Details
Published inBMC neurology Vol. 24; no. 1; p. 11
Main Authors Liu, Chi-Hao, Peng, Chung-Hsin, Huang, Li-Ying, Chen, Fang-Yu, Kuo, Chun-Heng, Wu, Chung-Ze, Cheng, Yu-Fang
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 02.01.2024
BioMed Central
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The prevalence of type 2 diabetes (T2D) has increased dramatically in recent decades, and there are increasing indications that dementia is related to T2D. Previous attempts to analyze such relationships principally relied on traditional multiple linear regression (MLR). However, recently developed machine learning methods (Mach-L) outperform MLR in capturing non-linear relationships. The present study applied four different Mach-L methods to analyze the relationships between risk factors and cognitive function in older T2D patients, seeking to compare the accuracy between MLR and Mach-L in predicting cognitive function and to rank the importance of risks factors for impaired cognitive function in T2D. We recruited older T2D between 60-95 years old without other major comorbidities. Demographic factors and biochemistry data were used as independent variables and cognitive function assessment (CFA) was conducted using the Montreal Cognitive Assessment as an independent variable. In addition to traditional MLR, we applied random forest (RF), stochastic gradient boosting (SGB), Naïve Byer's classifier (NB) and eXtreme gradient boosting (XGBoost). Totally, the test cohort consisted of 197 T2D (98 men and 99 women). Results showed that all ML methods outperformed MLR, with symmetric mean absolute percentage errors for MLR, RF, SGB, NB and XGBoost respectively of 0.61, 0.599, 0.606, 0.599 and 0.2139. Education level, age, frailty score, fasting plasma glucose and body mass index were identified as key factors in descending order of importance. In conclusion, our study demonstrated that RF, SGB, NB and XGBoost are more accurate than MLR for predicting CFA score, and identify education level, age, frailty score, fasting plasma glucose, body fat and body mass index as important risk factors in an older Chinese T2D cohort.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1471-2377
1471-2377
DOI:10.1186/s12883-023-03507-w