Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data (2306.14063v2)

Published 24 Jun 2023 in cs.LG and cs.AI

Abstract: Developing theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable. Currently, most results hinge on unrealistic assumptions about the data distribution -- namely that it comprises a set of i.i.d. trajectories collected by a single logging policy. We consider a more general setting where the dataset may have been gathered adaptively. We develop theory for the TMIS Offline Policy Evaluation (OPE) estimator in this generalized setting for tabular MDPs, deriving high-probability, instance-dependent bounds on its estimation error. We also recover minimax-optimal offline learning in the adaptive setting. Finally, we conduct simulations to empirically analyze the behavior of these estimators under adaptive and non-adaptive regimes.

References (33)

Authors (4)

Sunil Madhow (1 paper)
Dan Qiao (26 papers)
Ming Yin (70 papers)
Yu-Xiang Wang (124 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data (2306.14063v2)

Summary

Related Papers

Tweets