Optimal Dynamic Regret in LQR Control (2206.09257v1)
Abstract: We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of $\tilde{O}(\text{max}{n{1/3} \mathcal{TV}(M_{1:n}){2/3}, 1})$, where $\mathcal{TV}(M_{1:n})$ is the total variation of any oracle sequence of Disturbance Action policies parameterized by $M_1,...,M_n$ -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of $\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal $\tilde{O}(n{1/3})$ dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.