Stochastic Gradient Succeeds for Bandits
We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an \(O(1/t)\) rate, even with a \emph{constant} step size. Remarkably, global convergence of the stochastic gradient bandit algorithm has not been previously established, even though it is a...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
27.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!