Near-Optimal Regret Bounds for Thompson Sampling
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art...
Saved in:
Published in | Journal of the ACM Vol. 64; no. 5; pp. 1 - 24 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
Association for Computing Machinery
01.10.2017
|
Subjects | |
Online Access | Get full text |
ISSN | 0004-5411 1557-735X |
DOI | 10.1145/3088510 |
Cover
Loading…