Optimal Dynamic Regret in LQR Control
We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of \(\tilde{O}(\text{max}\{n^{1/3} \mathcal{TV}(M_{1:n})^{2/3}, 1\})\), where \(\mathcal{TV}(M_{1:n})\) i...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
18.06.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of \(\tilde{O}(\text{max}\{n^{1/3} \mathcal{TV}(M_{1:n})^{2/3}, 1\})\), where \(\mathcal{TV}(M_{1:n})\) is the total variation of any oracle sequence of Disturbance Action policies parameterized by \(M_1,...,M_n\) -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of \(\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )\) for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal \(\tilde{O}(n^{1/3})\) dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest. |
---|---|
ISSN: | 2331-8422 |