Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorit...

Full description

Saved in:

Bibliographic Details
Main Authors	Dorn, Yuriy, Katrutsa, Aleksandr, Latypov, Ilgam, Pudovikov, Andrey
Format	Journal Article
Language	English
Published	10.02.2024
Subjects	Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$ regret bound instead of $O\left (T^{\frac{1}{1+\alpha}} K^{\frac{\alpha}{1+\alpha}} \right)$ for the case when the reward distribution satisfies $\mathbb{E}_{X \in D}[\|X\|^{1+\alpha}] \leq \sigma^{1+\alpha}$ ($\alpha \in (0, 1])$, i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when $\alpha<0$.
DOI:	10.48550/arxiv.2402.07062