Depth Separations in Neural Networks: Separating the Dimension from the Accuracy
We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating an $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in $[0,1]^{d}$, assuming exponentially bounded weights. This addresses an open problem po...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
11.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We prove an exponential separation between depth 2 and depth 3 neural
networks, when approximating an $\mathcal{O}(1)$-Lipschitz target function to
constant accuracy, with respect to a distribution with support in $[0,1]^{d}$,
assuming exponentially bounded weights. This addresses an open problem posed in
\citet{safran2019depth}, and proves that the curse of dimensionality manifests
in depth 2 approximation, even in cases where the target function can be
represented efficiently using depth 3. Previously, lower bounds that were used
to separate depth 2 from depth 3 required that at least one of the Lipschitz
parameter, target accuracy or (some measure of) the size of the domain of
approximation scale polynomially with the input dimension, whereas we fix the
former two and restrict our domain to the unit hypercube. Our lower bound holds
for a wide variety of activation functions, and is based on a novel application
of an average- to worst-case random self-reducibility argument, to reduce the
problem to threshold circuits lower bounds. |
---|---|
DOI: | 10.48550/arxiv.2402.07248 |