Is Conditional Generative Modeling all you need for Decision-Making? (2211.15657v4)
Abstract: Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential decision-making. We view decision-making not through the lens of reinforcement learning (RL), but rather through conditional generative modeling. To our surprise, we find that our formulation leads to policies that can outperform existing offline RL approaches across standard benchmarks. By modeling a policy as a return-conditional diffusion model, we illustrate how we may circumvent the need for dynamic programming and subsequently eliminate many of the complexities that come with traditional offline RL. We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills. Conditioning on a single constraint or skill during training leads to behaviors at test-time that can satisfy several constraints together or demonstrate a composition of skills. Our results illustrate that conditional generative modeling is a powerful tool for decision-making.
- Learning to poke by poking: Experiential learning of intuitive physics. Advances in neural information processing systems, 29, 2016.
- Opal: Offline primitive discovery for accelerating offline reinforcement learning. arXiv preprint arXiv:2010.13611, 2020.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
- Decision transformer: Reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems, 2021.
- Offline reinforcement learning with pseudometric learning. arXiv preprint arXiv:2103.01948, 2021.
- Sander Dieleman. Guidance: a cheat code for diffusion models, 2022. URL https://benanne.github.io/2022/05/26/guidance.html.
- Implicit generation and generalization in energy-based models. In Advances in Neural Information Processing Systems, 2019.
- Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning, 110(9):2419–2468, 2021.
- Rvs: What is essential for offline rl via supervised learning? arXiv preprint arXiv:2112.10751, 2021.
- D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
- Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, 2019.
- Offline rl policies should be trained to be adaptive. In International Conference on Machine Learning, pp. 7513–7530. PMLR, 2022.
- Learning the stein discrepancy for training and evaluating energy-based models without sampling. In International Conference on Machine Learning, 2020.
- Meta-reinforcement learning of structured exploration strategies. Advances in neural information processing systems, 31, 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020.
- Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 2005.
- Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021.
- Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, 2022.
- MOReL: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, 2020.
- Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- Offline reinforcement learning with fisher divergence critic regularization. arXiv preprint arXiv:2103.08050, 2021.
- Offline reinforcement learning with implicit Q-learning. In International Conference on Learning Representations, 2022.
- Reward-conditioned policies. arXiv preprint arXiv:1912.13465, 2019.
- Conservative Q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems, 2020.
- A workflow for offline model-free robotic reinforcement learning. arXiv preprint arXiv:2109.10813, 2021.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Solving quantitative reasoning problems with language models. arXiv preprint arXiv:2206.14858, 2022.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Compositional visual generation with composable diffusion models. arXiv preprint arXiv:2206.01714, 2022.
- Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
- Walk these ways: Gait-conditioned policies yield diversified quadrupedal agility. In Conference on Robot Learning, 2022.
- Diganta Misra. Mish: A self regularized non-monotonic neural activation function. In British Machine Vision Conference, 2019.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, 2021.
- Learning non-convergent non-persistent short-run MCMC toward energy-based model. In Advances in Neural Information Processing Systems, 2019.
- Shaking the foundations: delusions in sequence models for interaction and control. arXiv preprint arXiv:2110.10819, 2021.
- You can’t count on luck: Why decision transformers fail in stochastic environments. arXiv preprint arXiv:2205.15967, 2022.
- Zero-shot visual imitation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 2050–2053, 2018.
- Imitating human behaviour with diffusion models. arXiv preprint arXiv:2301.10677, 2023.
- Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
- Juergen Schmidhuber. Reinforcement learning upside down: Don’t predict rewards–just map them to actions. arXiv preprint arXiv:1912.02875, 2019.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
- Training agents using upside-down reinforcement learning. arXiv preprint arXiv:1912.02877, 2019.
- Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 1988.
- Reinforcement learning: An introduction. MIT press, 2018.
- Russ Tedrake. Underactuated Robotics. 2022. URL http://underactuated.mit.edu.
- Deep reinforcement learning and the deadly triad. arXiv preprint arXiv:1812.02648, 2018.
- Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
- Addressing optimism bias in sequence modeling for reinforcement learning. In International Conference on Machine Learning, pp. 22270–22283. PMLR, 2022.
- Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning. arXiv preprint arXiv:2207.04703, 2022.
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, 2022.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
- Group normalization. In European Conference on Computer Vision, 2018.
- Dichotomy of control: Separating what you can control from what you cannot. arXiv preprint arXiv:2210.13435, 2022.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Online decision transformer. arXiv preprint arXiv:2202.05607, 2022.