Boosting Continuous Control with Consistency Policy (2310.06343v2)
Abstract: Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline reinforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL. We will release our code later.
- A distributional view on multi-objective policy optimization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 11–22. http://proceedings.mlr.press/v119/abdolmaleki20a.html
- Maximum a Posteriori Policy Optimisation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=S1ANxQW0b
- Is Conditional Generative Modeling all you need for Decision Making?. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=sP1fo2K9DFG
- Distributed Distributional Deterministic Policy Gradients. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SyZipzbCb
- A Distributional Perspective on Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 449–458. http://proceedings.mlr.press/v70/bellemare17a.html
- EDGI: Equivariant Diffusion for Planning with Embodied Agents. CoRR abs/2303.12410 (2023). https://doi.org/10.48550/arXiv.2303.12410 arXiv:2303.12410
- Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=42zs3qa2kpy
- Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. In Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, Kostas E. Bekris, Kris Hauser, Sylvia L. Herbert, and Jingjin Yu (Eds.). https://doi.org/10.15607/RSS.2023.XIX.026
- Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 8780–8794. https://proceedings.neurips.cc/paper/2021/hash/49ad23d1ec9fa4bd8d77d02681df5cfa-Abstract.html
- Offline Reinforcement Learning for Autonomous Driving with Real World Driving Data. In 25th IEEE International Conference on Intelligent Transportation Systems, ITSC 2022, Macau, China, October 8-12, 2022. IEEE, 3417–3422. https://doi.org/10.1109/ITSC55140.2022.9922100
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning. CoRR abs/2004.07219 (2020). arXiv:2004.07219 https://arxiv.org/abs/2004.07219
- Scott Fujimoto and Shixiang Shane Gu. 2021. A Minimalist Approach to Offline Reinforcement Learning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 20132–20145. https://proceedings.neurips.cc/paper/2021/hash/a8166da05c5a094f7dc03724b41886e5-Abstract.html
- Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2052–2062. http://proceedings.mlr.press/v97/fujimoto19a.html
- Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 1582–1591. http://proceedings.mlr.press/v80/fujimoto18a.html
- Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning? CoRR abs/2307.07837 (2023). https://doi.org/10.48550/arXiv.2307.07837 arXiv:2307.07837
- Wonjoon Goo and Scott Niekum. 2022. Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL. CoRR abs/2206.00695 (2022). https://doi.org/10.48550/arXiv.2206.00695 arXiv:2206.00695
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 1856–1865. http://proceedings.mlr.press/v80/haarnoja18b.html
- Mastering Diverse Domains through World Models. CoRR abs/2301.04104 (2023). https://doi.org/10.48550/arXiv.2301.04104 arXiv:2301.04104
- IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies. CoRR abs/2304.10573 (2023). https://doi.org/10.48550/arXiv.2304.10573 arXiv:2304.10573
- Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html
- Planning with Diffusion for Flexible Behavior Synthesis. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (Eds.). PMLR, 9902–9915. https://proceedings.mlr.press/v162/janner22a.html
- Efficient Diffusion Policies for Offline Reinforcement Learning. CoRR abs/2305.20081 (2023). https://doi.org/10.48550/arXiv.2305.20081 arXiv:2305.20081
- DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics. IEEE Robotics Autom. Lett. 8, 7 (2023), 3956–3963. https://doi.org/10.1109/LRA.2023.3272516
- Elucidating the Design Space of Diffusion-Based Generative Models. In NeurIPS. http://papers.nips.cc/paper_files/paper/2022/hash/a98846e9d9cc01cfb87eb694d946ce6b-Abstract-Conference.html
- Offline Reinforcement Learning with Implicit Q-Learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=68n2s9ZJWF8
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 11761–11771. https://proceedings.neurips.cc/paper/2019/hash/c2073ffa77b5357a498057413bb09d3a-Abstract.html
- Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/0d2b2061826a5df3221116a5085a6052-Abstract.html
- Batch Reinforcement Learning. In Reinforcement Learning, Marco A. Wiering and Martijn van Otterlo (Eds.). Adaptation, Learning, and Optimization, Vol. 12. Springer, 45–73. https://doi.org/10.1007/978-3-642-27645-3_2
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. CoRR abs/2005.01643 (2020). arXiv:2005.01643 https://arxiv.org/abs/2005.01643
- Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning. CoRR abs/2307.01849 (2023). https://doi.org/10.48550/arXiv.2307.01849 arXiv:2307.01849
- Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1509.02971
- Synthetic Experience Replay. CoRR abs/2303.06614 (2023). https://doi.org/10.48550/arXiv.2303.06614 arXiv:2303.06614
- Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning. CoRR abs/2304.12824 (2023). https://doi.org/10.48550/arXiv.2304.12824 arXiv:2304.12824
- Offline Reinforcement Learning with Value-based Episodic Memory. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=RCZqv9NXlZ
- AlgaeDICE: Policy Gradient from Arbitrary Experience. CoRR abs/1912.02074 (2019). arXiv:1912.02074 http://arxiv.org/abs/1912.02074
- Imitating Human Behaviour with Diffusion Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=Pv1GPQzRrC8
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning. CoRR abs/1910.00177 (2019). arXiv:1910.00177 http://arxiv.org/abs/1910.00177
- Relative Entropy Policy Search. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010, Maria Fox and David Poole (Eds.). AAAI Press. http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1851
- Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 13756–13766. https://doi.org/10.1109/CVPR52729.2023.01322
- Goal-Conditioned Imitation Learning using Score-based Diffusion Policies. In Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, Kostas E. Bekris, Kris Hauser, Sylvia L. Herbert, and Jingjin Yu (Eds.). https://doi.org/10.15607/RSS.2023.XIX.028
- Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347
- Consistency Models. CoRR abs/2303.01469 (2023). https://doi.org/10.48550/arXiv.2303.01469 arXiv:2303.01469
- Score-Based Generative Modeling through Stochastic Differential Equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=PxTIG12RRHS
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012. IEEE, 5026–5033. https://doi.org/10.1109/IROS.2012.6386109
- dm_control: Software and tasks for continuous control. Softw. Impacts 6 (2020), 100022. https://doi.org/10.1016/j.simpa.2020.100022
- Hado van Hasselt. 2010. Double Q-learning. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, John D. Lafferty, Christopher K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and Aron Culotta (Eds.). Curran Associates, Inc., 2613–2621. https://proceedings.neurips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html
- Diffusion Model-Augmented Behavioral Cloning. CoRR abs/2302.13335 (2023). https://doi.org/10.48550/arXiv.2302.13335 arXiv:2302.13335
- Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=AHvFDPi-FA
- Behavior Regularized Offline Reinforcement Learning. CoRR abs/1911.11361 (2019). arXiv:1911.11361 http://arxiv.org/abs/1911.11361
- The In-Sample Softmax for Offline Reinforcement Learning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=u-RuvyDYqCM
- Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=ueYYgo2pSSU
- Policy Representation via Diffusion Probability Model for Reinforcement Learning. CoRR abs/2305.13122 (2023). https://doi.org/10.48550/arXiv.2305.13122 arXiv:2305.13122
- Guided Conditional Diffusion for Controllable Traffic Simulation. In IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023. IEEE, 3560–3566. https://doi.org/10.1109/ICRA48891.2023.10161463