Comparison of tree-based ensemble models for regression
When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF l...
Saved in:
Published in | Communications for statistical applications and methods Vol. 29; no. 5; pp. 561 - 589 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | Korean |
Published |
한국통계학회
30.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached. |
---|---|
Bibliography: | The Korean Statistical Society KISTI1.1003/JNL.JAKO202228453771641 |
ISSN: | 2287-7843 |