Near-Optimal Regret Bounds for Thompson Sampling

Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art...

Full description

Saved in:
Bibliographic Details
Published inJournal of the ACM Vol. 64; no. 5; pp. 1 - 24
Main Authors Agrawal, Shipra, Goyal, Navin
Format Journal Article
LanguageEnglish
Published New York Association for Computing Machinery 01.10.2017
Subjects
Online AccessGet full text
ISSN0004-5411
1557-735X
DOI10.1145/3088510

Cover

Loading…