High-Probability Kernel Alignment Regret Bounds for Online Kernel Selection
In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1 $$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$ regret bounds, where κ∈K $...
Saved in:
Published in | Machine Learning and Knowledge Discovery in Databases. Research Track Vol. 12975; pp. 67 - 83 |
---|---|
Main Authors | , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2021
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1 $$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$ regret bounds, where κ∈K $$\kappa \in \mathcal {K}$$ , a preset candidate set. The worst-case regret bounds can not reveal kernel selection improves the performance of single kernel leaning in some benign environment. We develop two adaptive online kernel selection algorithms and obtain the first high-probability regret bound depending on A(IT,κ) $$\mathcal {A}(\mathcal {I}_T,\kappa )$$ , a variant of kernel alignment. If there is a kernel in the candidate set matching the data well, then our algorithms can improve the learning performance significantly and reduce the time complexity. Our results also justify using kernel alignment as a criterion for evaluating kernel function. The first algorithm has a O(T/K) per-round time complexity and enjoys a O(‖f‖Hi∗2KA(IT,κi∗)) $$O(\Vert f\Vert ^2_{\mathcal {H}_{i^*}} \sqrt{K\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$ high-probability regret bound. The second algorithm enjoys a O~(β-1TA(IT,κi∗)) $$\tilde{O}(\beta ^{-1} \sqrt{T\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$ per-round time complexity and achieves a O~(‖f‖Hi∗2K12β12T14A(IT,κi∗)14) $$\tilde{O}(\Vert f\Vert ^2_{\mathcal {H}_{{i^*}}}K^{\frac{1}{2}}\beta ^{\frac{1}{2}} T^{\frac{1}{4}}\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})^{\frac{1}{4}})$$ high-probability regret bound, where β≥1 $$\beta \ge 1$$ is a balancing factor and κi∗∈K $$\kappa _{i^*}\in \mathcal {K}$$ is the kernel with minimal A(IT,κ) $$\mathcal {A}(\mathcal {I}_T,\kappa )$$ . |
---|---|
Bibliography: | Electronic supplementary materialThe online version of this chapter (https://doi.org/10.1007/978-3-030-86486-6_5) contains supplementary material, which is available to authorized users. This work was supported in part by the National Natural Science Foundation of China under grants No. 62076181. Original Abstract: In this paper, we study data-dependent regret bounds for online kernel selection in the regime online classification with the hinge loss. Existing work only achieves O(‖f‖Hκ2Tα),12≤α<1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\Vert f\Vert ^2_{\mathcal {H}_{\kappa }}T^\alpha ), \frac{1}{2}\le \alpha <1$$\end{document} regret bounds, where κ∈K\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa \in \mathcal {K}$$\end{document}, a preset candidate set. The worst-case regret bounds can not reveal kernel selection improves the performance of single kernel leaning in some benign environment. We develop two adaptive online kernel selection algorithms and obtain the first high-probability regret bound depending on A(IT,κ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A}(\mathcal {I}_T,\kappa )$$\end{document}, a variant of kernel alignment. If there is a kernel in the candidate set matching the data well, then our algorithms can improve the learning performance significantly and reduce the time complexity. Our results also justify using kernel alignment as a criterion for evaluating kernel function. The first algorithm has a O(T/K) per-round time complexity and enjoys a O(‖f‖Hi∗2KA(IT,κi∗))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\Vert f\Vert ^2_{\mathcal {H}_{i^*}} \sqrt{K\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$\end{document} high-probability regret bound. The second algorithm enjoys a O~(β-1TA(IT,κi∗))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(\beta ^{-1} \sqrt{T\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})})$$\end{document} per-round time complexity and achieves a O~(‖f‖Hi∗2K12β12T14A(IT,κi∗)14)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(\Vert f\Vert ^2_{\mathcal {H}_{{i^*}}}K^{\frac{1}{2}}\beta ^{\frac{1}{2}} T^{\frac{1}{4}}\mathcal {A}(\mathcal {I}_T,\kappa _{i^*})^{\frac{1}{4}})$$\end{document} high-probability regret bound, where β≥1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta \ge 1$$\end{document} is a balancing factor and κi∗∈K\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{i^*}\in \mathcal {K}$$\end{document} is the kernel with minimal A(IT,κ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A}(\mathcal {I}_T,\kappa )$$\end{document}. |
ISBN: | 3030864855 9783030864859 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-030-86486-6_5 |