Generalization Bounds for Label Noise Stochastic Gradient Descent
We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
01.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension \(d\). Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with \(d\) and with the rate of \(n^{-2/3}\), where \(n\) is the sample size. This rate is better than the best-known rate of \(n^{-1/2}\) established for stochastic gradient Langevin dynamics (SGLD) -- which employs parameter-independent Gaussian noise -- under similar conditions. Our analysis offers quantitative insights into the effect of label noise. |
---|---|
ISSN: | 2331-8422 |