Online Learning with Bounded Recall
We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm \(\mathcal{A}\) is \(M\)-\(\textit{bounded-recall}\) if its output at time \(t\) can be written as a function of the \(M\) previous...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
31.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm \(\mathcal{A}\) is \(M\)-\(\textit{bounded-recall}\) if its output at time \(t\) can be written as a function of the \(M\) previous rewards (and not e.g. any other internal state of \(\mathcal{A}\)). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last \(M\) rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of \(\Theta(1/\sqrt{M})\), which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past \(M\) losses -- any bounded-recall algorithm which plays a symmetric function of the past \(M\) losses must incur constant regret per round. |
---|---|
ISSN: | 2331-8422 |