On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime
Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
06.10.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Finding the optimal configuration of parameters in ResNet is a nonconvex
minimization problem, but first-order methods nevertheless find the global
optimum in the overparameterized regime. We study this phenomenon with
mean-field analysis, by translating the training process of ResNet to a
gradient-flow partial differential equation (PDE) and examining the convergence
properties of this limiting process. The activation function is assumed to be
$2$-homogeneous or partially $1$-homogeneous; the regularized ReLU satisfies
the latter condition. We show that if the ResNet is sufficiently large, with
depth and width depending algebraically on the accuracy and confidence levels,
first-order optimization methods can find global minimizers that fit the
training data. |
---|---|
DOI: | 10.48550/arxiv.2110.02926 |