The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits
The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional...
Saved in:
Published in | Frontiers in artificial intelligence Vol. 4; p. 715690 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Switzerland
Frontiers Media S.A
09.07.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal (
Thompson sampling
). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 This article was submitted to Artificial Intelligence in Finance, a section of the journal Frontiers in Artificial Intelligence Edited by: Peter Schwendner, Zurich University of Applied Sciences, Switzerland Reviewed by: Norbert Hilber, ZHAW, Switzerland Bertrand Kian Hassani, University College London, United Kingdom |
ISSN: | 2624-8212 2624-8212 |
DOI: | 10.3389/frai.2021.715690 |