A modern Bayesian look at the multi-armed bandit

A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates obs...

Full description

Saved in:

Bibliographic Details
Published in	Applied stochastic models in business and industry Vol. 26; no. 6; pp. 639 - 658
Main Author	Scott, Steven L.
Format	Journal Article
Language	English
Published	Chichester, UK John Wiley & Sons, Ltd 01.11.2010
Subjects	Bayesian adaptive design Bayesian analysis Business exploration vs exploitation Flexibility Heuristic Matching Mathematical models Optimization probability matching Reinforcement sequential design
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.
Bibliography:	istex:CA01319168E0DF39B03376CA2173957D710EFA9E ArticleID:ASMB874 ark:/67375/WNG-J0Z227KZ-V ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1524-1904 1526-4025 1526-4025
DOI:	10.1002/asmb.874