The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits

The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in artificial intelligence Vol. 4; p. 715690
Main Author Loecher, Markus
Format Journal Article
LanguageEnglish
Published Switzerland Frontiers Media S.A 09.07.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal ( Thompson sampling ). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
This article was submitted to Artificial Intelligence in Finance, a section of the journal Frontiers in Artificial Intelligence
Edited by: Peter Schwendner, Zurich University of Applied Sciences, Switzerland
Reviewed by: Norbert Hilber, ZHAW, Switzerland
Bertrand Kian Hassani, University College London, United Kingdom
ISSN:2624-8212
2624-8212
DOI:10.3389/frai.2021.715690