Unified Optimal Analysis of the (Stochastic) Gradient Method
In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $\mu$-convex functions under a (milder than standard) $L$-smoothness assumption. We show that for carefully chosen stepsizes SGD converges after $T$ iterations as $O\left( LR^2 \exp \bigl[-\frac{\mu}{4L}T...
Saved in:
Main Author | |
---|---|
Format | Journal Article |
Language | English |
Published |
09.07.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this note we give a simple proof for the convergence of stochastic
gradient (SGD) methods on $\mu$-convex functions under a (milder than standard)
$L$-smoothness assumption. We show that for carefully chosen stepsizes SGD
converges after $T$ iterations as $O\left( LR^2 \exp
\bigl[-\frac{\mu}{4L}T\bigr] + \frac{\sigma^2}{\mu T} \right)$ where $\sigma^2$
measures the variance in the stochastic noise. For deterministic gradient
descent (GD) and SGD in the interpolation setting we have $\sigma^2 =0$ and we
recover the exponential convergence rate. The bound matches with the best known
iteration complexity of GD and SGD, up to constants. |
---|---|
DOI: | 10.48550/arxiv.1907.04232 |