Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise
In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorit...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this study, we propose a new method for constructing UCB-type algorithms
for stochastic multi-armed bandits based on general convex optimization methods
with an inexact oracle. We derive the regret bounds corresponding to the
convergence rates of the optimization methods. We propose a new algorithm
Clipped-SGD-UCB and show, both theoretically and empirically, that in the case
of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$
regret bound instead of $O\left (T^{\frac{1}{1+\alpha}}
K^{\frac{\alpha}{1+\alpha}} \right)$ for the case when the reward distribution
satisfies $\mathbb{E}_{X \in D}[|X|^{1+\alpha}] \leq \sigma^{1+\alpha}$
($\alpha \in (0, 1])$, i.e. perform better than it is assumed by the general
lower bound for bandits with heavy-tails. Moreover, the same bound holds even
when the reward distribution does not have the expectation, that is, when
$\alpha<0$. |
---|---|
DOI: | 10.48550/arxiv.2402.07062 |