Machine learning-based prediction of LDL cholesterol: performance evaluation and validation

This study aimed to validate and optimize a machine learning algorithm for accurately predicting low-density lipoprotein cholesterol (LDL-C) levels, addressing limitations of traditional formulas, particularly in hypertriglyceridemia. Various machine learning models-linear regression, K-nearest neig...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ (San Francisco, CA) Vol. 13; p. e19248
Main Authors	Meng, Jing-Bi, An, Zai-Jian, Jiang, Chun-Shan
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 09.04.2025 PeerJ, Inc PeerJ Inc
Subjects	Accuracy Adult Aged Algorithms Biochemistry Cardiovascular diseases Cholesterol Cholesterol, LDL - blood Correlation coefficients Data mining Data Mining and Machine Learning Data Science Datasets Decision trees Estimates Female High density lipoprotein Humans Hypertriglyceridemia Hypertriglyceridemia - blood Learning algorithms Lipids Lipoproteins Low density lipoprotein Low density lipoproteins Low-density lipoprotein cholesterol Machine Learning Machinie learning Male Medical laboratories Methods Middle Aged Multilayer perceptrons Performance evaluation Phenols Quality control Regression analysis Risk assessment Risk Assessment - methods Triglyceride Triglycerides Triglycerides - blood China Machinie learning Lipids Triglyceride Low-density lipoprotein cholesterol
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This study aimed to validate and optimize a machine learning algorithm for accurately predicting low-density lipoprotein cholesterol (LDL-C) levels, addressing limitations of traditional formulas, particularly in hypertriglyceridemia. Various machine learning models-linear regression, K-nearest neighbors (KNN), decision tree, random forest, eXtreme Gradient Boosting (XGB), and multilayer perceptron (MLP) regressor-were compared to conventional formulas (Friedewald, Martin, and Sampson) using lipid profiles from 120,174 subjects (2020-2023). Predictive performance was evaluated using R-squared ( ), mean squared error (MSE), and Pearson correlation coefficient (PCC) against measured LDL-C values. Machine learning models outperformed traditional methods, with Random Forest and XGB achieving the highest accuracy ( = 0.94, MSE = 89.25) on the internal dataset. Among the traditional formulas, the Sampson method performed best but showed reduced accuracy in high triglyceride (TG) groups (TG > 300 mg/dL). Machine learning models maintained high predictive power across all TG levels. Machine learning models offer more accurate LDL-C estimates, especially in high TG contexts where traditional formulas are less reliable. These models could enhance cardiovascular risk assessment by providing more precise LDL-C estimates, potentially leading to more informed treatment decisions and improved patient outcomes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 ObjectType-Undefined-3
ISSN:	2167-8359 2167-8359 2376-5992
DOI:	10.7717/peerj.19248