Understanding generalization error of SGD in nonconvex optimization

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing generalization bounds based on stability do not incorporate the interplay between the optimization of...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning Vol. 111; no. 1; pp. 345 - 375
Main Authors	Zhou, Yi, Liang, Yingbin, Zhang, Huishuai
Format	Journal Article
Language	English
Published	New York Springer US 2022 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Computer engineering Computer Science Control Datasets Distance learning Entropy Errors Experiments Labels Machine Learning Mechatronics Natural Language Processing (NLP) Neural networks Optimization Robotics Simulation and Modeling Stability Nonconvex machine learning Generalization error Stochastic gradient descent
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing generalization bounds based on stability do not incorporate the interplay between the optimization of SGD and the underlying data distribution, and hence cannot even capture the effect of randomized labels on the generalization performance. In this paper, we establish generalization error bounds for SGD by characterizing the corresponding stability in terms of the on-average variance of the stochastic gradients. Such characterizations lead to improved bounds on the generalization error of SGD and experimentally explain the effect of the random labels on the generalization performance. We also study the regularized risk minimization problem with strongly convex regularizers, and obtain improved generalization error bounds for the proximal SGD.
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-021-06056-w