A study on the use of bootstrap aggregation methods in estimation of stable parameters

Background & Aim: In many medical studies, one data set is used to construct the model, and to test its performance. This approach is prone to over optimization, and leads to statistics with low chance of external validity. Data splitting can be used to create training and test sets but the cost...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biostatistics and epidemiology Vol. 2; no. 2
Main Authors	Morteza Rostami, Behshid Garrusi, Mohamad Reza Baneshi
Format	Journal Article
Language	English
Published	Tehran University of Medical Sciences 01.12.2016
Subjects	Bagging Bootstrap aggregating Classification and regression tree (CART) Data splitting External validity
Online Access	Get full text
ISSN	2383-4196 2383-420X

Cover

Loading…

More Information
Summary:	Background & Aim: In many medical studies, one data set is used to construct the model, and to test its performance. This approach is prone to over optimization, and leads to statistics with low chance of external validity. Data splitting can be used to create training and test sets but the cost is reduction in power. The aim of this study was to demonstrate the ability of bootstrap aggregating (bagging) in improving performance of classification and regression tree (CART) models. Methods & Materials: CART was applied on a sample of 404 subjects, to identify the factors that encourage people to change their body shape by cosmetic surgeries. Comparing known status of subjects with predicted group, sensitivity and specificity of models were compared. Firstly, all data was used to construct the tree and to test its performance. Secondly, model was fitted on half of data and tested on the second half. Thirdly, bagging was applied in which we drew 100 bootstrap samples. Using each bootstrap data, a tree was constr cted and its performance was tested on the unselected subjects. Final group prediction for each subject was determined following majority voting. Results: When the whole data was used the overall accuracy was 59%. In the test data set and bagging, accuracy reduced to 53% and 56%. Corresponding figures in terms of sensitivity were 60%, 52%, and 55%, respectively. Conclusion: Bagging corrected performance estimates for over optimization. Bagging method produces statistics which has higher chance for external validity.
ISSN:	2383-4196 2383-420X