Frequency Bias in Neural Networks for Input of Non-Uniform Density
Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias -- networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn fro...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.03.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent works have partly attributed the generalization ability of
over-parameterized neural networks to frequency bias -- networks trained with
gradient descent on data drawn from a uniform distribution find a low frequency
fit before high frequency ones. As realistic training sets are not drawn from a
uniform distribution, we here use the Neural Tangent Kernel (NTK) model to
explore the effect of variable density on training dynamics. Our results, which
combine analytic and empirical observations, show that when learning a pure
harmonic function of frequency $\kappa$, convergence at a point $\x \in
\Sphere^{d-1}$ occurs in time $O(\kappa^d/p(\x))$ where $p(\x)$ denotes the
local density at $\x$. Specifically, for data in $\Sphere^1$ we analytically
derive the eigenfunctions of the kernel associated with the NTK for two-layer
networks. We further prove convergence results for deep, fully connected
networks with respect to the spectral decomposition of the NTK. Our empirical
study highlights similarities and differences between deep and shallow networks
in this model. |
---|---|
DOI: | 10.48550/arxiv.2003.04560 |