Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent

The minimization of the loss function is of paramount importance in deep neural networks. Many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations, we introduce a second...

Full description

Saved in:

Bibliographic Details
Published in	2020 25th International Conference on Pattern Recognition (ICPR) pp. 8220 - 8227
Main Authors	Ayadi, Imen, Turinici, Gabriel
Format	Conference Proceeding
Language	English
Published	IEEE 10.01.2021
Subjects	Adaptive systems Image databases Minimization Neural networks Optimization Pattern recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The minimization of the loss function is of paramount importance in deep neural networks. Many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations, we introduce a second-order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition, it can be coupled, in an adaptive framework, with the Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD. The resulting adaptive SGD, called SGD-G2, shows good results in terms of convergence speed when tested on standard data-sets.
DOI:	10.1109/ICPR48806.2021.9412831