Emergent Mind

Optimal Dynamic Regret in LQR Control

(2206.09257)
Published Jun 18, 2022 in cs.LG , math.DS , math.OC , and stat.ML

Abstract

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of $\tilde{O}(\text{max}{n{1/3} \mathcal{TV}(M{1:n}){2/3}, 1})$, where $\mathcal{TV}(M{1:n})$ is the total variation of any oracle sequence of Disturbance Action policies parameterized by $M1,...,Mn$ -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of $\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal $\tilde{O}(n{1/3})$ dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.