A study on the use of bootstrap aggregation methods in estimation of stable parameters
Background & Aim: In many medical studies, one data set is used to construct the model, and to test its performance. This approach is prone to over optimization, and leads to statistics with low chance of external validity. Data splitting can be used to create training and test sets but the cost...
Saved in:
Published in | Journal of biostatistics and epidemiology Vol. 2; no. 2 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Tehran University of Medical Sciences
01.12.2016
|
Subjects | |
Online Access | Get full text |
ISSN | 2383-4196 2383-420X |
Cover
Loading…
Summary: | Background & Aim: In many medical studies, one data set is used to construct the model, and to test its performance. This approach is prone to over optimization, and leads to statistics with low chance of external validity. Data splitting can be used to create training and test sets but the cost is reduction in power. The aim of this study was to demonstrate the ability of bootstrap aggregating (bagging) in improving performance of classification and regression tree (CART) models. Methods & Materials: CART was applied on a sample of 404 subjects, to identify the factors that encourage people to change their body shape by cosmetic surgeries. Comparing known status of subjects with predicted group, sensitivity and specificity of models were compared. Firstly, all data was used to construct the tree and to test its performance. Secondly, model was fitted on half of data and tested on the second half. Thirdly, bagging was applied in which we drew 100 bootstrap samples. Using each bootstrap data, a tree was constr cted and its performance was tested on the unselected subjects. Final group prediction for each subject was determined following majority voting. Results: When the whole data was used the overall accuracy was 59%. In the test data set and bagging, accuracy reduced to 53% and 56%. Corresponding figures in terms of sensitivity were 60%, 52%, and 55%, respectively. Conclusion: Bagging corrected performance estimates for over optimization. Bagging method produces statistics which has higher chance for external validity. |
---|---|
ISSN: | 2383-4196 2383-420X |