On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-...

Full description

Saved in:

Bibliographic Details
Main Authors	Ding, Zhiyan, Chen, Shi, Li, Qin, Wright, Stephen
Format	Journal Article
Language	English
Published	06.10.2021
Subjects	Computer Science - Learning Computer Science - Numerical Analysis Mathematics - Numerical Analysis Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-flow partial differential equation (PDE) and examining the convergence properties of this limiting process. The activation function is assumed to be $2$-homogeneous or partially $1$-homogeneous; the regularized ReLU satisfies the latter condition. We show that if the ResNet is sufficiently large, with depth and width depending algebraically on the accuracy and confidence levels, first-order optimization methods can find global minimizers that fit the training data.
DOI:	10.48550/arxiv.2110.02926