The Variational Garrote

We analyze the variational method for sparse regression using ℓ 0 regularization. The variational approximation results in a model that is similar to Breiman’s Garrote model. We refer to this method as the Variational Garrote (VG). The VG has the effect of making the problem effectively of maximal r...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning Vol. 96; no. 3; pp. 269 - 294
Main Authors	Kappen, Hilbert J., Gómez, Vicenç
Format	Journal Article
Language	English
Published	New York Springer US 01.09.2014 Springer Springer Nature B.V
Subjects	Annealing Applied sciences Approximation Artificial Intelligence Computer Science Computer science; control theory; systems Connectionism. Neural networks Control Exact sciences and technology Heating Linear inference, regression Mathematical analysis Mathematical models Mathematics Mean-field Mechatronics Natural Language Processing (NLP) Nonnegative garrote Parametric inference Probability and statistics Regression Regression analysis Robotics Schedules Sciences and techniques of general use Simulation and Modeling Sparse regression Spike-and-slab Statistics Variational approximation Variational methods Sparse regression Variational approximation Spike-and-slab Mean-field Nonnegative garrote Bayes estimation Mean field approximation Cross validation Regression analysis Neural network Modeling Mean-field theory Lagrange multiplier Model matching Simulated annealing Sparse matrix Variational calculus Sparse representation Cubics Small sample Regularization Posterior distribution
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We analyze the variational method for sparse regression using ℓ 0 regularization. The variational approximation results in a model that is similar to Breiman’s Garrote model. We refer to this method as the Variational Garrote (VG). The VG has the effect of making the problem effectively of maximal rank even when the number of samples is small compared to the number of variables. We propose a naive mean field approximation combined with a maximum a posteriori (MAP) approach to estimate the model parameters and use an annealing and reheating schedule of the sparsity hyper-parameter to avoid local minima. The hyper-parameter is set by cross-validation. We compare the VG with the lasso, ridge regression and the recently introduced Bayesian paired mean field method (PMF) (Titsias and Lázaro-Gredilla in Advances in neural information processing systems, vol. 24, pp. 2339–2347, 2011 ). For fair comparison, we implemented a similar annealing-reheating schedule for the PMF sparsity parameter. Numerical results show that the VG and PMF yield more accurate predictions and more accurately reconstruct the true model than the other methods. The VG finds correct solutions when the lasso solution is inconsistent due to large input correlations. In the experiments that we consider we find that the VG, although based on a simpler approximation than the PMF, yields qualitatively similar or better results and is computationally more efficient. The naive implementation of the VG scales cubic with the number of features. By introducing Lagrange multipliers we obtain a dual formulation of the problem that scales cubic in the number of samples, but close to linear in the number of features.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-013-5427-7