How Many Neurons Does it Take to Approximate the Maximum?
We study the size of a neural network needed to approximate the maximum function over $d$ inputs, in the most basic setting of approximating with respect to the $L_2$ norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width requ...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We study the size of a neural network needed to approximate the maximum
function over $d$ inputs, in the most basic setting of approximating with
respect to the $L_2$ norm, for continuous distributions, for a network that
uses ReLU activations. We provide new lower and upper bounds on the width
required for approximation across various depths. Our results establish new
depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as
providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$
construction which approximates the maximum function. Our depth separation
results are facilitated by a new lower bound for depth 2 networks approximating
the maximum function over the uniform distribution, assuming an exponential
upper bound on the size of the weights. Furthermore, we are able to use this
depth 2 lower bound to provide tight bounds on the number of neurons needed to
approximate the maximum by a depth 3 network. Our lower bounds are of
potentially broad interest as they apply to the widely studied and used
\emph{max} function, in contrast to many previous results that base their
bounds on specially constructed or pathological functions and distributions. |
---|---|
DOI: | 10.48550/arxiv.2307.09212 |