Model selection procedures in social research: Monte-Carlo simulation results

Model selection strategies play an important, if not explicit, role in quantitative research. The inferential properties of these strategies are largely unknown, therefore, there is little basis for recommending (or avoiding) any particular set of strategies. In this paper, we evaluate several commo...

Full description

Saved in:

Bibliographic Details
Published in	Journal of applied statistics Vol. 35; no. 10; pp. 1093 - 1114
Main Authors	Raffalovich, Lawrence E., Deane, Glenn D., Armstrong, David, Tsao, Hui-Shien
Format	Journal Article
Language	English
Published	Abingdon Routledge 01.10.2008 Taylor and Francis Journals Taylor & Francis Ltd
Series	Journal of Applied Statistics
Subjects	AIC Bayesian analysis BIC Data processing model selection Monte Carlo simulation Social research stepwise regression
Online Access	Get full text
ISSN	0266-4763 1360-0532
DOI	10.1080/03081070802203959

Cover

Loading…

More Information
Summary:	Model selection strategies play an important, if not explicit, role in quantitative research. The inferential properties of these strategies are largely unknown, therefore, there is little basis for recommending (or avoiding) any particular set of strategies. In this paper, we evaluate several commonly used model selection procedures [Bayesian information criterion (BIC), adjusted R 2 , Mallows' C p , Akaike information criteria (AIC), AIC c , and stepwise regression] using Monte-Carlo simulation of model selection when the true data generating processes (DGP) are known. We find that the ability of these selection procedures to include important variables and exclude irrelevant variables increases with the size of the sample and decreases with the amount of noise in the model. None of the model selection procedures do well in small samples, even when the true DGP is largely deterministic; thus, data mining in small samples should be avoided entirely. Instead, the implicit uncertainty in model specification should be explicitly discussed. In large samples, BIC is better than the other procedures at correctly identifying most of the generating processes we simulated, and stepwise does almost as well. In the absence of strong theory, both BIC and stepwise appear to be reasonable model selection strategies in large samples. Under the conditions simulated, adjusted R 2 , Mallows' C p AIC, and AIC c are clearly inferior and should be avoided.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14
ISSN:	0266-4763 1360-0532
DOI:	10.1080/03081070802203959