On a continuous time model of gradient descent dynamics and instability in deep learning
The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance. |
---|---|
DOI: | 10.48550/arxiv.2302.01952 |