Charting the Topography of the Neural Network Landscape with Thermal-Like Noise
The training of neural networks is a complex, high-dimensional, non-convex and noisy optimization problem whose theoretical understanding is interesting both from an applicative perspective and for fundamental reasons. A core challenge is to understand the geometry and topography of the landscape th...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.04.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The training of neural networks is a complex, high-dimensional, non-convex
and noisy optimization problem whose theoretical understanding is interesting
both from an applicative perspective and for fundamental reasons. A core
challenge is to understand the geometry and topography of the landscape that
guides the optimization. In this work, we employ standard Statistical Mechanics
methods, namely, phase-space exploration using Langevin dynamics, to study this
landscape for an over-parameterized fully connected network performing a
classification task on random data. Analyzing the fluctuation statistics, in
analogy to thermal dynamics at a constant temperature, we infer a clear
geometric description of the low-loss region. We find that it is a
low-dimensional manifold whose dimension can be readily obtained from the
fluctuations. Furthermore, this dimension is controlled by the number of data
points that reside near the classification decision boundary. Importantly, we
find that a quadratic approximation of the loss near the minimum is
fundamentally inadequate due to the exponential nature of the decision boundary
and the flatness of the low-loss region. This causes the dynamics to sample
regions with higher curvature at higher temperatures, while producing
quadratic-like statistics at any given temperature. We explain this behavior by
a simplified loss model which is analytically tractable and reproduces the
observed fluctuation statistics. |
---|---|
DOI: | 10.48550/arxiv.2304.01335 |