Comparison of linear, penalized linear and machine learning models predicting hospital visit costs from chronic disease in Thailand
Generally, health care costs from chronic diseases have positive skew and this gives problems on using traditional statistical models. Machine learning is a conventional method producing accurate prediction with large sample size. However, much of the comparison performance between statistical metho...
Saved in:
Published in | Informatics in medicine unlocked Vol. 26; p. 100769 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
2021
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Generally, health care costs from chronic diseases have positive skew and this gives problems on using traditional statistical models. Machine learning is a conventional method producing accurate prediction with large sample size. However, much of the comparison performance between statistical methods and machine learning for such data remains scattered. This study aimed to compare linear, penalized linear and machine learning models for their prediction performance of hospital visit costs from chronic disease, in Thailand. A total of 18,342 hospital visit records were obtained from Suratthani tertiary hospital in southern Thailand, which contained data from 2016 on chronic patients of Diagnosis-Related Groups (DRGs). The prediction performance on hospital visit costs by linear, penalized linear and machine learning models were compared using both original dataset and datasets expanded in size two- and four-fold by using bootstrap. The mean age of patients was 56.3 ± 22.6 years with 55.6% of visits by males. The median hospital cost was 16,662 Baht per visit. The random forest (RF) model had the best predictive performance of hospital visit costs for all sizes of dataset with the smallest prediction errors, whereas ridge linear regression had the poorest prediction performance with the largest prediction errors. Machine learning models had better prediction performance with enlarged sample sizes whereas linear and penalized linear models did not. On modeling big data for prediction, machine learning models are preferable, whereas linear and penalized linear models' predictions are not affected by increasing the sample size.
•Random forest model has the best predictive performance for hospital visit costs.•Prediction of linear regression and penalized linear regression models are not affected by the increasing sample size.•Machine learning models are appropriate for predication performance in large sample size. |
---|---|
ISSN: | 2352-9148 2352-9148 |
DOI: | 10.1016/j.imu.2021.100769 |