A Bayesian Approach to Multicollinearity and the Simultaneous Selection and Clustering of Predictors in Linear Regression

High correlation among predictors has long been an annoyance in regression analysis. The crux of the problem is that the linear regression model assumes each predictor has an independent effect on the response that can be encapsulated in the predictor's regression coefficient. When predictors a...

Full description

Saved in:
Bibliographic Details
Published inJournal of statistical theory and practice Vol. 5; no. 4; pp. 715 - 735
Main Authors Curtis, S. McKay, Ghosh, Sujit K.
Format Journal Article
LanguageEnglish
Published Cham Taylor & Francis Group 01.12.2011
Springer International Publishing
Subjects
Online AccessGet full text
ISSN1559-8608
1559-8616
DOI10.1080/15598608.2011.10483741

Cover

More Information
Summary:High correlation among predictors has long been an annoyance in regression analysis. The crux of the problem is that the linear regression model assumes each predictor has an independent effect on the response that can be encapsulated in the predictor's regression coefficient. When predictors are highly correlated, the data do not contain much information on the independent effects of each predictor. The high correlation among predictors can result in large standard errors for the regression coefficients and coefficients with signs opposite of what is expected based on a priori, subject-matter theory. We propose a Bayesian model that accounts for correlation among the predictors by simultaneously performing selection and clustering of the predictors. Our model combines a Dirichlet process prior and a variable selection prior for the regression coefficients. In our model highly correlated predictors can be grouped together by setting their corresponding coefficients exactly equal. Similarly, redundant predictors can be removed from the model through the variable selection component of our prior. We demonstrate the competitiveness of our method through simulation studies and analysis of real data.
ISSN:1559-8608
1559-8616
DOI:10.1080/15598608.2011.10483741