Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations whe...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Frei, Spencer, Vardi, Gal, Bartlett, Peter L, Srebro, Nathan, Hu, Wei
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 13.10.2022
Subjects	Algorithms Bias Gradient flow Machine learning Neural networks Optimization Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an \(\ell_2\)-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training. We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.
ISSN:	2331-8422