High-Probability Kernel Alignment Regret Bounds for Online Kernel Selection

In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1 $$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$ regret bounds, where κ∈K $...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases. Research Track Vol. 12975; pp. 67 - 83
Main Authors Liao, Shizhong, Li, Junfan
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1 $$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$ regret bounds, where κ∈K $$\kappa \in \mathcal {K}$$ , a preset candidate set. The worst-case regret bounds can not reveal kernel selection improves the performance of single kernel leaning in some benign environment. We develop two adaptive online kernel selection algorithms and obtain the first high-probability regret bound depending on A(IT,κ) $$\mathcal {A}(\mathcal {I}_T,\kappa )$$ , a variant of kernel alignment. If there is a kernel in the candidate set matching the data well, then our algorithms can improve the learning performance significantly and reduce the time complexity. Our results also justify using kernel alignment as a criterion for evaluating kernel function. The first algorithm has a O(T/K) per-round time complexity and enjoys a O(‖f‖Hi∗2KA(IT,κi∗)) $$O(\Vert f\Vert ^2_{\mathcal {H}_{i^*}} \sqrt{K\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$ high-probability regret bound. The second algorithm enjoys a O~(β-1TA(IT,κi∗)) $$\tilde{O}(\beta ^{-1} \sqrt{T\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$ per-round time complexity and achieves a O~(‖f‖Hi∗2K12β12T14A(IT,κi∗)14) $$\tilde{O}(\Vert f\Vert ^2_{\mathcal {H}_{{i^*}}}K^{\frac{1}{2}}\beta ^{\frac{1}{2}} T^{\frac{1}{4}}\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})^{\frac{1}{4}})$$ high-probability regret bound, where β≥1 $$\beta \ge 1$$ is a balancing factor and κi∗∈K $$\kappa _{i^*}\in \mathcal {K}$$ is the kernel with minimal A(IT,κ) $$\mathcal {A}(\mathcal {I}_T,\kappa )$$ .
Bibliography:Electronic supplementary materialThe online version of this chapter (https://doi.org/10.1007/978-3-030-86486-6_5) contains supplementary material, which is available to authorized users.
This work was supported in part by the National Natural Science Foundation of China under grants No. 62076181.
Original Abstract: In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$\end{document} regret bounds, where κ∈K\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa \in \mathcal {K}$$\end{document}, a preset candidate set. The worst-case regret bounds can not reveal kernel selection improves the performance of single kernel leaning in some benign environment. We develop two adaptive online kernel selection algorithms and obtain the first high-probability regret bound depending on A(IT,κ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A}(\mathcal {I}_T,\kappa )$$\end{document}, a variant of kernel alignment. If there is a kernel in the candidate set matching the data well, then our algorithms can improve the learning performance significantly and reduce the time complexity. Our results also justify using kernel alignment as a criterion for evaluating kernel function. The first algorithm has a O(T/K) per-round time complexity and enjoys a O(‖f‖Hi∗2KA(IT,κi∗))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\Vert f\Vert ^2_{\mathcal {H}_{i^*}} \sqrt{K\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$\end{document} high-probability regret bound. The second algorithm enjoys a O~(β-1TA(IT,κi∗))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(\beta ^{-1} \sqrt{T\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$\end{document} per-round time complexity and achieves a O~(‖f‖Hi∗2K12β12T14A(IT,κi∗)14)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(\Vert f\Vert ^2_{\mathcal {H}_{{i^*}}}K^{\frac{1}{2}}\beta ^{\frac{1}{2}} T^{\frac{1}{4}}\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})^{\frac{1}{4}})$$\end{document} high-probability regret bound, where β≥1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta \ge 1$$\end{document} is a balancing factor and κi∗∈K\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{i^*}\in \mathcal {K}$$\end{document} is the kernel with minimal A(IT,κ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A}(\mathcal {I}_T,\kappa )$$\end{document}.
ISBN:3030864855
9783030864859
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-86486-6_5