Model selection procedures in social research: Monte-Carlo simulation results

Model selection strategies play an important, if not explicit, role in quantitative research. The inferential properties of these strategies are largely unknown, therefore, there is little basis for recommending (or avoiding) any particular set of strategies. In this paper, we evaluate several commo...

Full description

Saved in:
Bibliographic Details
Published inJournal of applied statistics Vol. 35; no. 10; pp. 1093 - 1114
Main Authors Raffalovich, Lawrence E., Deane, Glenn D., Armstrong, David, Tsao, Hui-Shien
Format Journal Article
LanguageEnglish
Published Abingdon Routledge 01.10.2008
Taylor and Francis Journals
Taylor & Francis Ltd
SeriesJournal of Applied Statistics
Subjects
Online AccessGet full text
ISSN0266-4763
1360-0532
DOI10.1080/03081070802203959

Cover

Loading…
More Information
Summary:Model selection strategies play an important, if not explicit, role in quantitative research. The inferential properties of these strategies are largely unknown, therefore, there is little basis for recommending (or avoiding) any particular set of strategies. In this paper, we evaluate several commonly used model selection procedures [Bayesian information criterion (BIC), adjusted R 2 , Mallows' C p , Akaike information criteria (AIC), AIC c , and stepwise regression] using Monte-Carlo simulation of model selection when the true data generating processes (DGP) are known. We find that the ability of these selection procedures to include important variables and exclude irrelevant variables increases with the size of the sample and decreases with the amount of noise in the model. None of the model selection procedures do well in small samples, even when the true DGP is largely deterministic; thus, data mining in small samples should be avoided entirely. Instead, the implicit uncertainty in model specification should be explicitly discussed. In large samples, BIC is better than the other procedures at correctly identifying most of the generating processes we simulated, and stepwise does almost as well. In the absence of strong theory, both BIC and stepwise appear to be reasonable model selection strategies in large samples. Under the conditions simulated, adjusted R 2 , Mallows' C p AIC, and AIC c are clearly inferior and should be avoided.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ISSN:0266-4763
1360-0532
DOI:10.1080/03081070802203959