Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm (2103.09847v1)

Published 17 Mar 2021 in cs.LG, cs.AI, math.OC, and stat.ML

Abstract: In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime $d\gamma^{2}>1$, where $d$ is the dimension of the feature vector and $\gamma$ is the discount rate. In this regime, for any $q\in[\gamma^{2},1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{{2}\left(q-\gamma^{{2}\right)\varepsilon^{{2}}\exp\left(\Theta\left(d\gamma^{{2}\right)\right)\right)$}}}} samples to approximate the value function up to an additive error $\varepsilon$. Note that the lower bound of the sample complexity is exponential in $d$. If $q=\gamma^{2}$, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O\left(\max\left{ \frac{\left\Vert \theta^{{\pi}\right\Vert} _{2}^{{4}}{\varepsilon^{{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{{2}}\left(d+\log\frac{1}{\delta}\right)\right}}}} \right)$ samples ($\theta^{\pi}$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-\delta$.

Citations (16)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm (2103.09847v1)

Summary

Related Papers