Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort

Background The identification of associated overweight risk factors is crucial to future health risk predictions and behavioral interventions. Several consensus problems remain in machine learning, such as cross-validation, and the resulting model may suffer from overfitting or poor interpretability...

Full description

Saved in:

Bibliographic Details
Published in	Endocrine Vol. 83; no. 3; pp. 604 - 614
Main Authors	Lin, Wei, Shi, Songchang, Lan, Huiyu, Wang, Nengying, Huang, Huibin, Wen, Junping, Chen, Gang
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2024 Springer Nature B.V
Subjects	Body weight China - epidemiology Diabetes East Asian People Endocrinology Humanities and Social Sciences Humans Internal Medicine Learning algorithms Machine Learning Medicine Medicine & Public Health multidisciplinary Original Article Overweight Overweight - epidemiology Retrospective Studies Risk Factors Science Visual discrimination learning China Risk Interpretable Prediction model Machine learning Overweight
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Background The identification of associated overweight risk factors is crucial to future health risk predictions and behavioral interventions. Several consensus problems remain in machine learning, such as cross-validation, and the resulting model may suffer from overfitting or poor interpretability. Methods This study employed nine commonly used machine learning methods to construct overweight risk models. The general community are the target of this study, and a total of 10,905 Chinese subjects from Ningde City in Fujian province, southeast China, participated. The best model was selected through appropriate verification and validation and was suitably explained. Results The overweight risk models employing machine learning exhibited good performance. It was concluded that CatBoost, which is used in the construction of clinical risk models, may surpass previous machine learning methods. The visual display of the Shapley additive explanation value for the machine model variables accurately represented the influence of each variable in the model. Conclusions The construction of an overweight risk model using machine learning may currently be the best approach. Moreover, CatBoost may be the best machine learning method. Furthermore, combining Shapley’s additive explanation and machine learning methods can be effective in identifying disease risk factors for prevention and control.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1559-0100 1355-008X 1559-0100
DOI:	10.1007/s12020-023-03536-y