Policy-regularized Offline Multi-objective Reinforcement Learning (2401.02244v1)
Abstract: In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.
- A distributional view on multi-objective policy optimization. In International conference on machine learning. PMLR, 11–22.
- Dynamic weights in multi-objective deep reinforcement learning. In International conference on machine learning. PMLR, 11–20.
- Multi-objective reinforcement learning with non-linear scalarization. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 9–17.
- Pd-morl: Preference-driven multi-objective reinforcement learning algorithm. arXiv preprint arXiv:2208.07914 (2022).
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems 34 (2021), 15084–15097.
- Rvs: What is essential for offline rl via supervised learning? arXiv preprint arXiv:2112.10751 (2021).
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219 (2020).
- Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems 34 (2021), 20132–20145.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
- Hisashi Handa. 2009. Solving multi-objective reinforcement learning problems by EDA-RL-acquisition of various strategies. In 2009 ninth international conference on intelligent systems design and applications. IEEE, 426–431.
- Grounded action transformation for sim-to-real reinforcement learning. Machine Learning 110, 9 (2021), 2469–2499.
- Champion-level drone racing using deep reinforcement learning. Nature 620, 7976 (2023), 982–987.
- Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020).
- Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems 45, 3 (2014), 385–398.
- Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707 (2016).
- Pareto Conditioned Networks. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 1110–1118.
- Linear support for multi-objective coordination graphs. In AAMAS’14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, Vol. 2. IFAAMAS/ACM, 1297–1304.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.
- Preventing undesirable behavior of intelligent machines. Science 366, 6468 (2019), 999–1004.
- Multi-objective spibb: Seldonian offline policy improvement with safety constraints in finite mdps. Advances in Neural Information Processing Systems 34 (2021), 2004–2017.
- Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine learning 84 (2011), 51–80.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193 (2022).
- Offline constrained multi-objective reinforcement learning via pessimistic dual value iteration. Advances in Neural Information Processing Systems 34 (2021), 25439–25451.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
- Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning. PMLR, 10607–10616.
- A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems 32 (2019).
- DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference. 167–176.
- Scaling pareto-efficient decision making via offline multi-objective rl. arXiv preprint arXiv:2305.00567 (2023).
- Qian Lin (79 papers)
- Chao Yu (116 papers)
- Zongkai Liu (9 papers)
- Zifan Wu (8 papers)