Pegasos: primal estimated sub-gradient solver for SVM

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy is , where each iteration operates on a single trainin...

Full description

Saved in:

Bibliographic Details
Published in	Mathematical programming Vol. 127; no. 1; pp. 3 - 30
Main Authors	Shalev-Shwartz, Shai, Singer, Yoram, Srebro, Nathan, Cotter, Andrew
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer-Verlag 01.03.2011 Springer Nature B.V
Subjects	Accuracy Algorithms Bias Calculus of Variations and Optimal Control; Optimization Combinatorics Datasets Decomposition Full Length Paper Iterative methods Kernels Learning Mathematical analysis Mathematical and Computational Physics Mathematical Methods in Physics Mathematical programming Mathematics Mathematics and Statistics Mathematics of Computing Methods Numerical Analysis Optimization Solvers Stochastic models Studies Support vector machines Theoretical Training More SVM Stochastic gradient descent First Second
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy is , where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/ λ , where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is , where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0025-5610 1436-4646
DOI:	10.1007/s10107-010-0420-4