Comparison of tree-based ensemble models for regression

When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF l...

Full description

Saved in:
Bibliographic Details
Published inCommunications for statistical applications and methods Vol. 29; no. 5; pp. 561 - 589
Main Authors Park, Sangho, Kim, Chanmin
Format Journal Article
LanguageKorean
Published 한국통계학회 30.09.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.
Bibliography:The Korean Statistical Society
KISTI1.1003/JNL.JAKO202228453771641
ISSN:2287-7843