Generalization Bounds for Label Noise Stochastic Gradient Descent

We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Huh, Jung Eun, Rebeschini, Patrick
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 01.11.2023
Subjects	Algorithms Errors Gradient flow Labels Machine learning Parameters Random noise Smoothness
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension \(d\). Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with \(d\) and with the rate of \(n^{-2/3}\), where \(n\) is the sample size. This rate is better than the best-known rate of \(n^{-1/2}\) established for stochastic gradient Langevin dynamics (SGLD) -- which employs parameter-independent Gaussian noise -- under similar conditions. Our analysis offers quantitative insights into the effect of label noise.
ISSN:	2331-8422