Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model

Background To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests. Methods A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen ou...

Full description

Saved in:

Bibliographic Details
Published in	Journal of clinical laboratory analysis Vol. 34; no. 9; pp. e23421 - n/a
Main Authors	Su, Xi, Xu, Yongyong, Tan, Zhijun, Wang, Xia, Yang, Peng, Su, Yani, Jiang, Yangyang, Qin, Sijia, Shang, Lei
Format	Journal Article
Language	English
Published	New York John Wiley & Sons, Inc 01.09.2020 John Wiley and Sons Inc
Subjects	Age Blood pressure Body mass index Bronchitis Cardiovascular disease Cardiovascular diseases Cholesterol Diabetes Family medical history Gender High density lipoprotein Hypertension Laboratories Population prediction model Prediction models random forest Regression analysis Risk assessment Risk factors Software Stroke Tuberculosis Variables
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Background To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests. Methods A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen out the variables that greatly affected the CVD prediction and to establish a prediction model. The important variables were included in the multifactorial logistic regression analysis. The area under the curve (AUC) was compared between logistic regression model and random forest model. Results The random forest model revealed the variables, including the age, body mass index (BMI), fasting blood glucose (FBG), diastolic blood pressure (DBP), triglyceride (TG), systolic blood pressure (SBP), total cholesterol (TC), waist circumference, and high‐density lipoprotein‐cholesterol (HDL‐C), were more significant for CVD prediction; the AUC was 0.802 in CVD prediction. Multifactorial logistic regression analysis indicated that the risk factors for CVD included the age [odds ratio (OR): 1.14, 95% confidence intervals (CI): 1.10‐1.17, P < .001], BMI (OR: 1.13, 95% CI: 1.06‐1.20, P < .001), TG (OR: 1.11, 95% CI: 1.02‐1.22, P = .023), and DBP (OR: 1.04, 95% CI: 1.02‐1.06, P = .001); the AUC was 0.843 in CVD prediction. The established logistic regression prediction model was Logit P = Log[P/(1 − P)] = −11.47 + 0.13 × age + 0.12 × BMI + 0.11 × TG + 0.04 × DBP; P = 1/[1 + exp(−Logit P)]. People were prone to develop CVD at the time of P > .51. Conclusions A prediction model for CVD is developed in the general population based on random forests, which provides a simple tool for the early prediction of CVD.
Bibliography:	Funding information This study was supported by Special Scientific Research Program of Department of Education of Shaanxi Province, China (Grant No. 19JK0770). ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0887-8013 1098-2825 1098-2825
DOI:	10.1002/jcla.23421