The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits

The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in artificial intelligence Vol. 4; p. 715690
Main Author	Loecher, Markus
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 09.07.2021
Subjects	A/B testing American options Artificial Intelligence multi-armed bandits optional stopping sequential testing optional stopping A/B testing American options multi-armed bandits sequential testing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal ( Thompson sampling ). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 This article was submitted to Artificial Intelligence in Finance, a section of the journal Frontiers in Artificial Intelligence Edited by: Peter Schwendner, Zurich University of Applied Sciences, Switzerland Reviewed by: Norbert Hilber, ZHAW, Switzerland Bertrand Kian Hassani, University College London, United Kingdom
ISSN:	2624-8212 2624-8212
DOI:	10.3389/frai.2021.715690