High-Probability Kernel Alignment Regret Bounds for Online Kernel Selection

In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1 $$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$ regret bounds, where κ∈K $...

Full description

Saved in:

Bibliographic Details
Published in	Machine Learning and Knowledge Discovery in Databases. Research Track Vol. 12975; pp. 67 - 83
Main Authors	Liao, Shizhong, Li, Junfan
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2021 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Kernel method Model selection Online learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1 $$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$ regret bounds, where κ∈K $$\kappa \in \mathcal {K}$$ , a preset candidate set. The worst-case regret bounds can not reveal kernel selection improves the performance of single kernel leaning in some benign environment. We develop two adaptive online kernel selection algorithms and obtain the first high-probability regret bound depending on A(IT,κ) $$\mathcal {A}(\mathcal {I}_T,\kappa )$$ , a variant of kernel alignment. If there is a kernel in the candidate set matching the data well, then our algorithms can improve the learning performance significantly and reduce the time complexity. Our results also justify using kernel alignment as a criterion for evaluating kernel function. The first algorithm has a O(T/K) per-round time complexity and enjoys a O(‖f‖Hi∗2KA(IT,κi∗)) $$O(\Vert f\Vert ^2_{\mathcal {H}_{i^}} \sqrt{K\mathcal {A}(\mathcal {I}_T,\kappa _{i^})})$$ high-probability regret bound. The second algorithm enjoys a O~(β-1TA(IT,κi∗)) $$\tilde{O}(\beta ^{-1} \sqrt{T\mathcal {A}(\mathcal {I}_T,\kappa _{i^})})$$ per-round time complexity and achieves a O~(‖f‖Hi∗2K12β12T14A(IT,κi∗)14) $$\tilde{O}(\Vert f\Vert ^2_{\mathcal {H}_{{i^}}}K^{\frac{1}{2}}\beta ^{\frac{1}{2}} T^{\frac{1}{4}}\mathcal {A}(\mathcal {I}_T,\kappa _{i^})^{\frac{1}{4}})$$ high-probability regret bound, where β≥1 $$\beta \ge 1$$ is a balancing factor and κi∗∈K $$\kappa _{i^}\in \mathcal {K}$$ is the kernel with minimal A(IT,κ) $$\mathcal {A}(\mathcal {I}_T,\kappa )$$ .
Bibliography:	Electronic supplementary materialThe online version of this chapter (https://doi.org/10.1007/978-3-030-86486-6_5) contains supplementary material, which is available to authorized users. This work was supported in part by the National Natural Science Foundation of China under grants No. 62076181. Original Abstract: In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$\end{document} regret bounds, where κ∈K\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa \in \mathcal {K}$$\end{document}, a preset candidate set. The worst-case regret bounds can not reveal kernel selection improves the performance of single kernel leaning in some benign environment. We develop two adaptive online kernel selection algorithms and obtain the first high-probability regret bound depending on A(IT,κ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A}(\mathcal {I}_T,\kappa )$$\end{document}, a variant of kernel alignment. If there is a kernel in the candidate set matching the data well, then our algorithms can improve the learning performance significantly and reduce the time complexity. Our results also justify using kernel alignment as a criterion for evaluating kernel function. The first algorithm has a O(T/K) per-round time complexity and enjoys a O(‖f‖Hi∗2KA(IT,κi∗))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\Vert f\Vert ^2_{\mathcal {H}_{i^}} \sqrt{K\mathcal {A}(\mathcal {I}_T,\kappa _{i^})})$$\end{document} high-probability regret bound. The second algorithm enjoys a O~(β-1TA(IT,κi∗))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(\beta ^{-1} \sqrt{T\mathcal {A}(\mathcal {I}_T,\kappa _{i^})})$$\end{document} per-round time complexity and achieves a O~(‖f‖Hi∗2K12β12T14A(IT,κi∗)14)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(\Vert f\Vert ^2_{\mathcal {H}_{{i^}}}K^{\frac{1}{2}}\beta ^{\frac{1}{2}} T^{\frac{1}{4}}\mathcal {A}(\mathcal {I}_T,\kappa _{i^})^{\frac{1}{4}})$$\end{document} high-probability regret bound, where β≥1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta \ge 1$$\end{document} is a balancing factor and κi∗∈K\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{i^}\in \mathcal {K}$$\end{document} is the kernel with minimal A(IT,κ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A}(\mathcal {I}_T,\kappa )$$\end{document}.
ISBN:	3030864855 9783030864859
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-86486-6_5