Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
02.08.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Despite the ubiquitous use of stochastic optimization algorithms in machine
learning, the precise impact of these algorithms and their dynamics on
generalization performance in realistic non-convex settings is still poorly
understood. While recent work has revealed connections between generalization
and heavy-tailed behavior in stochastic optimization, this work mainly relied
on continuous-time approximations; and a rigorous treatment for the original
discrete-time iterations is yet to be performed. To bridge this gap, we present
novel bounds linking generalization to the lower tail exponent of the
transition kernel associated with the optimizer around a local minimum, in both
discrete- and continuous-time settings. To achieve this, we first prove a data-
and algorithm-dependent generalization bound in terms of the celebrated
Fernique-Talagrand functional applied to the trajectory of the optimizer. Then,
we specialize this result by exploiting the Markovian structure of stochastic
optimizers, and derive bounds in terms of their (data-dependent) transition
kernels. We support our theory with empirical results from a variety of neural
networks, showing correlations between generalization error and lower tail
exponents. |
---|---|
DOI: | 10.48550/arxiv.2108.00781 |