Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent. These findings seem counter-intuitive since in general neural networks are highly complex models....
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Wide neural networks with linear output layer have been shown to be
near-linear, and to have near-constant neural tangent kernel (NTK), in a region
containing the optimization path of gradient descent. These findings seem
counter-intuitive since in general neural networks are highly complex models.
Why does a linear structure emerge when the networks become wide? In this work,
we provide a new perspective on this "transition to linearity" by considering a
neural network as an assembly model recursively built from a set of sub-models
corresponding to individual neurons. In this view, we show that the linearity
of wide neural networks is, in fact, an emerging property of assembling a large
number of diverse "weak" sub-models, none of which dominate the assembly. |
---|---|
DOI: | 10.48550/arxiv.2203.05104 |