Penalized regression combining the L₁ norm and a correlation based penalty

We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The method is based o...

Full description

Saved in:

Bibliographic Details
Published in	Sankhyā. Series B (2008) Vol. 76; no. 1; pp. 82 - 102
Main Authors	El Anbari, Mohammed, Mkhadri, Abdallah
Format	Journal Article
Language	English
Published	India Springer 01.05.2014 Springer India
Subjects	Contour lines Correlation coefficients Correlations Datasets Estimators Least squares Linear regression Mathematics and Statistics Modeling Sample size Statistical median Statistics regression Lasso regularization Elastic-Net Primary 62J05; Secondary 62J07 Variable selection correlation based penalty
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The method is based on penalized least squares with a penalty function that combines the L₁ and a Correlation based Penalty (CP) norms. We call it L1CP method. Like the Lasso penalty, L1CP shrinks some coefficients to exactly zero and additionally, the CP term explicitly links strength of penalization to the correlation among predictors. A detailed simulation study in small and high dimensional settings is performed. It illustrates the advantages of our approach compared to several alternatives. Finally, we apply the methodology to two real data sets: US Crime Data and GC-Retention PAC data. In terms of prediction accuracy and estimation error, our empirical study suggests that the L1CP is more adapted than the Elastic-Net to situations where p ≤ n (the number of variables is less or equal to the sample size). If p » n, our method remains competitive and also allows the selection of more than n variables.
ISSN:	0976-8386 0976-8394
DOI:	10.1007/s13571-013-0065-4