Approximation and gradient descent training with neural networks
It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Rec...
Saved in:
Published in | Sampling theory, signal processing, and data analysis Vol. 23; no. 2 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
01.12.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Recent work uses the smoothness that is required for approximation results to extend a neural tangent kernel (NTK) optimization argument to an under-parametrized regime and show direct approximation bounds for networks trained by gradient flow. Since gradient flow is only an idealization of a practical method, this paper establishes analogous results for networks trained by gradient descent. |
---|---|
ISSN: | 2730-5716 2730-5724 |
DOI: | 10.1007/s43670-025-00116-1 |