Tight conditions for when the NTK approximation is valid
We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of $\alpha = O(T)$ suffices for the NTK approximation to be valid until training time $T$....
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
22.05.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We study when the neural tangent kernel (NTK) approximation is valid for
training a model with the square loss. In the lazy training setting of Chizat
et al. 2019, we show that rescaling the model by a factor of $\alpha = O(T)$
suffices for the NTK approximation to be valid until training time $T$. Our
bound is tight and improves on the previous bound of Chizat et al. 2019, which
required a larger rescaling factor of $\alpha = O(T^2)$. |
---|---|
DOI: | 10.48550/arxiv.2305.13141 |