A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs (2402.04493v2)
Abstract: We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $O(\epsilon{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(\epsilon{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $O(\epsilon{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.
- Yasin Abbasi-Yadkori, Dávid Pál and Csaba Szepesvári “Improved algorithms for linear stochastic bandits” In Advances in neural information processing systems, 2011
- Eitan Altman “Constrained Markov decision processes”, 2021
- András Antos, Csaba Szepesvári and Rémi Munos “Fitted Q-iteration in continuous action-space MDPs” In Advances in neural information processing systems, 2007
- “Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach” In Journal of Artificial Intelligence Research, 2023
- “Safe learning in robotics: From learning-based control to safe reinforcement learning” In Annual Review of Control, Robotics, and Autonomous Systems, 2022
- “Information-theoretic considerations in batch reinforcement learning” In International Conference on Machine Learning, 2019 PMLR
- “Offline reinforcement learning under value and density-ratio realizability: the power of gaps” In Uncertainty in Artificial Intelligence, 2022 PMLR
- Yi Chen, Jing Dong and Zhaoran Wang “A primal-dual approach to constrained markov decision processes” In arXiv preprint arXiv:2101.10895, 2021
- “Natural policy gradient primal-dual method for constrained markov decision processes” In Advances in Neural Information Processing Systems, 2020
- “Offline Primal-Dual Reinforcement Learning for Linear MDPs” In arXiv preprint arXiv:2305.12944, 2023
- Elad Hazan “Introduction to online convex optimization” In Foundations and Trends® in Optimization, 2016
- Kihyuk Hong, Yuhang Li and Ambuj Tewari “A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning” In arXiv preprint arXiv:2306.07818, 2023
- “Provably efficient reinforcement learning with linear function approximation” In Conference on Learning Theory, 2020 PMLR
- Ying Jin, Zhuoran Yang and Zhaoran Wang “Is pessimism provably efficient for offline rl?” In International Conference on Machine Learning, 2021 PMLR
- “A workflow for offline model-free robotic reinforcement learning” In arXiv preprint arXiv:2109.10813, 2021
- “Bandit algorithms”, 2020
- Hoang Le, Cameron Voloshin and Yisong Yue “Batch policy learning under constraints” In International Conference on Machine Learning, 2019 PMLR
- “Offline reinforcement learning: Tutorial, review, and perspectives on open problems” In arXiv preprint arXiv:2005.01643, 2020
- “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection” In The International journal of robotics research, 2018
- Rémi Munos “Error bounds for approximate policy iteration” In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 2003
- Rémi Munos “Error bounds for approximate value iteration” In Proceedings of the National Conference on Artificial Intelligence, 2005 Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999
- “Finite-Time Bounds for Fitted Value Iteration.” In Journal of Machine Learning Research, 2008
- Martin L Puterman “Markov decision processes: discrete stochastic dynamic programming”, 2014
- “Model selection for offline reinforcement learning: Practical considerations for healthcare settings” In Machine Learning for Healthcare Conference, 2021 PMLR
- “Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage” In International Conference on Learning Representations, 2022
- “First-order regret in reinforcement learning with linear function approximation: A robust estimation approach” In International Conference on Machine Learning, 2022 PMLR
- Martin J Wainwright “High-dimensional statistics: A non-asymptotic viewpoint”, 2019
- “Safe off-policy deep reinforcement learning algorithm for volt-var control in power distribution systems” In IEEE Transactions on Smart Grid, 2019
- “Learning infinite-horizon average-reward mdps with linear function approximation” In International Conference on Artificial Intelligence and Statistics, 2021 PMLR
- “Bellman-consistent pessimism for offline reinforcement learning” In Advances in neural information processing systems, 2021
- “Q* approximation schemes for batch reinforcement learning: A theoretical comparison” In Conference on Uncertainty in Artificial Intelligence, 2020 PMLR
- Andrea Zanette “When is realizability sufficient for off-policy reinforcement learning?” In International Conference on Machine Learning, 2023 PMLR
- Andrea Zanette, Martin J Wainwright and Emma Brunskill “Provable benefits of actor-critic methods for offline reinforcement learning” In Advances in neural information processing systems, 2021
- “Offline reinforcement learning with realizability and single-policy concentrability” In Conference on Learning Theory, 2022 PMLR
- Hanlin Zhu, Paria Rashidinejad and Jiantao Jiao “Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning” In arXiv preprint arXiv:2301.12714, 2023
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.