Regularisasi model pembelajaran mesin dengan regresi terpenalti pada data yang mengandung multikolinearitas (Studi kasus prediksi Indeks Pembangunan Manusia di 34 provinsi di Indonesia)
This research intends to model high-dimensional data that contains multicollinearity in four machine-learning algorithms: Random Forest, K-Nearest Neighbor, XGBoost, and Regression Tree. Previously, regularization was carried out with penalized ridge regression, least absolute shrinkage and selectio...
Saved in:
Published in | Majalah Ilmiah Matematika dan Statistika Vol. 24; no. 1; pp. 12 - 26 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Department of Mathematics FMIPA University of Jember
14.03.2024
|
Online Access | Get full text |
Cover
Loading…
Summary: | This research intends to model high-dimensional data that contains multicollinearity in four machine-learning algorithms: Random Forest, K-Nearest Neighbor, XGBoost, and Regression Tree. Previously, regularization was carried out with penalized ridge regression, least absolute shrinkage and selection operator (LASSO) regression, and Elastic Net regression. A total of 100 predictor variables and 1 response variable which are the Development Index 2022 data of 34 provinces in Indonesia from BPS were used and standardized. The simulation is also applied to highly correlated data on two distributions, uniform and normal with parameter values taken from existing empirical data. The results showed that the ridge regularization method is the best for producing accurate and stable predictions. Furthermore, there was no difference in the root mean square error (RMSE) results between the data with standardization and without standardization, wherein all the data analyzed it was found that the kNN model was better than other models on simulation data, and the Random Forest and XGBoost models were better than other models on empirical data. In addition, the Regression Tree model is not recommended according to the results of this study.
Keywords: regularization, multicollinearity, ridge, LASSO, elastic netMSC2020: 62J07 |
---|---|
ISSN: | 1411-6669 2722-9866 |
DOI: | 10.19184/mims.v24i1.40360 |