Bandits with Mean Bounds

We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restri...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Sharma, Nihal, Basu, Soumya, Shanmugam, Karthikeyan, Shakkottai, Sanjay
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 28.10.2024
Subjects	Algorithms Decision theory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated setting of learning from confounded logs where mean bounds appear naturally.
ISSN:	2331-8422