Fast and strong convergence of online learning algorithms

In this paper, we study the online learning algorithm without explicit regularization terms. This algorithm is essentially a stochastic gradient descent scheme in a reproducing kernel Hilbert space (RKHS). The polynomially decaying step size in each iteration can play a role of regularization to ens...

Full description

Saved in:

Bibliographic Details
Published in	Advances in computational mathematics Vol. 45; no. 5-6; pp. 2745 - 2770
Main Authors	Guo, Zheng-Chu, Shi, Lei
Format	Journal Article
Language	English
Published	New York Springer US 01.12.2019 Springer Nature B.V
Subjects	Algorithms Computational mathematics Computational Mathematics and Numerical Analysis Computational Science and Engineering Convergence Distance learning Fine structure Hilbert space Learning theory Machine learning Mathematical and Computational Biology Mathematical Modeling and Industrial Mathematics Mathematics Mathematics and Statistics Regularization Visualization 62J02 Strong convergence in an RKHS 68T05 Learning theory Online learning 68Q32 62L20 Capacity dependent error analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we study the online learning algorithm without explicit regularization terms. This algorithm is essentially a stochastic gradient descent scheme in a reproducing kernel Hilbert space (RKHS). The polynomially decaying step size in each iteration can play a role of regularization to ensure the generalization ability of online learning algorithm. We develop a novel capacity dependent analysis on the performance of the last iterate of online learning algorithm. This answers an open problem in learning theory. The contribution of this paper is twofold. First, our novel capacity dependent analysis can lead to sharp convergence rate in the standard mean square distance which improves the results in the literature. Second, we establish, for the first time, the strong convergence of the last iterate with polynomially decaying step sizes in the RKHS norm. We demonstrate that the theoretical analysis established in this paper fully exploits the fine structure of the underlying RKHS, and thus can lead to sharp error estimates of online learning algorithm.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1019-7168 1572-9044
DOI:	10.1007/s10444-019-09707-8