Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Continuous Control with Consistency Policy (2310.06343v2)

Published 10 Oct 2023 in cs.LG

Abstract: Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline reinforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL. We will release our code later.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. A distributional view on multi-objective policy optimization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 11–22. http://proceedings.mlr.press/v119/abdolmaleki20a.html
  2. Maximum a Posteriori Policy Optimisation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=S1ANxQW0b
  3. Is Conditional Generative Modeling all you need for Decision Making?. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=sP1fo2K9DFG
  4. Distributed Distributional Deterministic Policy Gradients. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SyZipzbCb
  5. A Distributional Perspective on Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 449–458. http://proceedings.mlr.press/v70/bellemare17a.html
  6. EDGI: Equivariant Diffusion for Planning with Embodied Agents. CoRR abs/2303.12410 (2023). https://doi.org/10.48550/arXiv.2303.12410 arXiv:2303.12410
  7. Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=42zs3qa2kpy
  8. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. In Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, Kostas E. Bekris, Kris Hauser, Sylvia L. Herbert, and Jingjin Yu (Eds.). https://doi.org/10.15607/RSS.2023.XIX.026
  9. Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 8780–8794. https://proceedings.neurips.cc/paper/2021/hash/49ad23d1ec9fa4bd8d77d02681df5cfa-Abstract.html
  10. Offline Reinforcement Learning for Autonomous Driving with Real World Driving Data. In 25th IEEE International Conference on Intelligent Transportation Systems, ITSC 2022, Macau, China, October 8-12, 2022. IEEE, 3417–3422. https://doi.org/10.1109/ITSC55140.2022.9922100
  11. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. CoRR abs/2004.07219 (2020). arXiv:2004.07219 https://arxiv.org/abs/2004.07219
  12. Scott Fujimoto and Shixiang Shane Gu. 2021. A Minimalist Approach to Offline Reinforcement Learning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 20132–20145. https://proceedings.neurips.cc/paper/2021/hash/a8166da05c5a094f7dc03724b41886e5-Abstract.html
  13. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2052–2062. http://proceedings.mlr.press/v97/fujimoto19a.html
  14. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 1582–1591. http://proceedings.mlr.press/v80/fujimoto18a.html
  15. Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning? CoRR abs/2307.07837 (2023). https://doi.org/10.48550/arXiv.2307.07837 arXiv:2307.07837
  16. Wonjoon Goo and Scott Niekum. 2022. Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL. CoRR abs/2206.00695 (2022). https://doi.org/10.48550/arXiv.2206.00695 arXiv:2206.00695
  17. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 1856–1865. http://proceedings.mlr.press/v80/haarnoja18b.html
  18. Mastering Diverse Domains through World Models. CoRR abs/2301.04104 (2023). https://doi.org/10.48550/arXiv.2301.04104 arXiv:2301.04104
  19. IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies. CoRR abs/2304.10573 (2023). https://doi.org/10.48550/arXiv.2304.10573 arXiv:2304.10573
  20. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html
  21. Planning with Diffusion for Flexible Behavior Synthesis. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (Eds.). PMLR, 9902–9915. https://proceedings.mlr.press/v162/janner22a.html
  22. Efficient Diffusion Policies for Offline Reinforcement Learning. CoRR abs/2305.20081 (2023). https://doi.org/10.48550/arXiv.2305.20081 arXiv:2305.20081
  23. DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics. IEEE Robotics Autom. Lett. 8, 7 (2023), 3956–3963. https://doi.org/10.1109/LRA.2023.3272516
  24. Elucidating the Design Space of Diffusion-Based Generative Models. In NeurIPS. http://papers.nips.cc/paper_files/paper/2022/hash/a98846e9d9cc01cfb87eb694d946ce6b-Abstract-Conference.html
  25. Offline Reinforcement Learning with Implicit Q-Learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=68n2s9ZJWF8
  26. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 11761–11771. https://proceedings.neurips.cc/paper/2019/hash/c2073ffa77b5357a498057413bb09d3a-Abstract.html
  27. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/0d2b2061826a5df3221116a5085a6052-Abstract.html
  28. Batch Reinforcement Learning. In Reinforcement Learning, Marco A. Wiering and Martijn van Otterlo (Eds.). Adaptation, Learning, and Optimization, Vol. 12. Springer, 45–73. https://doi.org/10.1007/978-3-642-27645-3_2
  29. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. CoRR abs/2005.01643 (2020). arXiv:2005.01643 https://arxiv.org/abs/2005.01643
  30. Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning. CoRR abs/2307.01849 (2023). https://doi.org/10.48550/arXiv.2307.01849 arXiv:2307.01849
  31. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1509.02971
  32. Synthetic Experience Replay. CoRR abs/2303.06614 (2023). https://doi.org/10.48550/arXiv.2303.06614 arXiv:2303.06614
  33. Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning. CoRR abs/2304.12824 (2023). https://doi.org/10.48550/arXiv.2304.12824 arXiv:2304.12824
  34. Offline Reinforcement Learning with Value-based Episodic Memory. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=RCZqv9NXlZ
  35. AlgaeDICE: Policy Gradient from Arbitrary Experience. CoRR abs/1912.02074 (2019). arXiv:1912.02074 http://arxiv.org/abs/1912.02074
  36. Imitating Human Behaviour with Diffusion Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=Pv1GPQzRrC8
  37. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning. CoRR abs/1910.00177 (2019). arXiv:1910.00177 http://arxiv.org/abs/1910.00177
  38. Relative Entropy Policy Search. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010, Maria Fox and David Poole (Eds.). AAAI Press. http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1851
  39. Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 13756–13766. https://doi.org/10.1109/CVPR52729.2023.01322
  40. Goal-Conditioned Imitation Learning using Score-based Diffusion Policies. In Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, Kostas E. Bekris, Kris Hauser, Sylvia L. Herbert, and Jingjin Yu (Eds.). https://doi.org/10.15607/RSS.2023.XIX.028
  41. Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347
  42. Consistency Models. CoRR abs/2303.01469 (2023). https://doi.org/10.48550/arXiv.2303.01469 arXiv:2303.01469
  43. Score-Based Generative Modeling through Stochastic Differential Equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=PxTIG12RRHS
  44. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012. IEEE, 5026–5033. https://doi.org/10.1109/IROS.2012.6386109
  45. dm_control: Software and tasks for continuous control. Softw. Impacts 6 (2020), 100022. https://doi.org/10.1016/j.simpa.2020.100022
  46. Hado van Hasselt. 2010. Double Q-learning. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, John D. Lafferty, Christopher K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and Aron Culotta (Eds.). Curran Associates, Inc., 2613–2621. https://proceedings.neurips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html
  47. Diffusion Model-Augmented Behavioral Cloning. CoRR abs/2302.13335 (2023). https://doi.org/10.48550/arXiv.2302.13335 arXiv:2302.13335
  48. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=AHvFDPi-FA
  49. Behavior Regularized Offline Reinforcement Learning. CoRR abs/1911.11361 (2019). arXiv:1911.11361 http://arxiv.org/abs/1911.11361
  50. The In-Sample Softmax for Offline Reinforcement Learning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=u-RuvyDYqCM
  51. Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=ueYYgo2pSSU
  52. Policy Representation via Diffusion Probability Model for Reinforcement Learning. CoRR abs/2305.13122 (2023). https://doi.org/10.48550/arXiv.2305.13122 arXiv:2305.13122
  53. Guided Conditional Diffusion for Controllable Traffic Simulation. In IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023. IEEE, 3560–3566. https://doi.org/10.1109/ICRA48891.2023.10161463
Citations (12)

Summary

We haven't generated a summary for this paper yet.