Online Learning with Bounded Recall

We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm \(\mathcal{A}\) is \(M\)-\(\textit{bounded-recall}\) if its output at time \(t\) can be written as a function of the \(M\) previous...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Schneider, Jon, Vodrahalli, Kiran
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 31.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm \(\mathcal{A}\) is \(M\)-\(\textit{bounded-recall}\) if its output at time \(t\) can be written as a function of the \(M\) previous rewards (and not e.g. any other internal state of \(\mathcal{A}\)). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last \(M\) rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of \(\Theta(1/\sqrt{M})\), which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past \(M\) losses -- any bounded-recall algorithm which plays a symmetric function of the past \(M\) losses must incur constant regret per round.
ISSN:2331-8422