Emergent Mind

Is Conditional Generative Modeling all you need for Decision-Making?

(2211.15657)
Published Nov 28, 2022 in cs.LG and cs.AI

Abstract

Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential decision-making. We view decision-making not through the lens of reinforcement learning (RL), but rather through conditional generative modeling. To our surprise, we find that our formulation leads to policies that can outperform existing offline RL approaches across standard benchmarks. By modeling a policy as a return-conditional diffusion model, we illustrate how we may circumvent the need for dynamic programming and subsequently eliminate many of the complexities that come with traditional offline RL. We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills. Conditioning on a single constraint or skill during training leads to behaviors at test-time that can satisfy several constraints together or demonstrate a composition of skills. Our results illustrate that conditional generative modeling is a powerful tool for decision-making.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Learning to poke by poking: Experiential learning of intuitive physics. Advances in neural information processing systems, 29
  2. OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems
  4. Decision transformer: Reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems
  5. Offline Reinforcement Learning with Pseudometric Learning
  6. Sander Dieleman. Guidance: a cheat code for diffusion models, 2022. https://benanne.github.io/2022/05/26/guidance.html.

  7. Implicit generation and generalization in energy-based models. In Advances in Neural Information Processing Systems
  8. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning, 110(9):2419–2468
  9. RvS: What is Essential for Offline RL via Supervised Learning?
  10. D4RL: Datasets for Deep Data-Driven Reinforcement Learning
  11. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR
  12. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning
  13. Offline rl policies should be trained to be adaptive. In International Conference on Machine Learning, pp. 7513–7530. PMLR
  14. Learning the stein discrepancy for training and evaluating energy-based models without sampling. In International Conference on Machine Learning
  15. Meta-reinforcement learning of structured exploration strategies. Advances in neural information processing systems, 31
  16. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning
  17. Classifier-Free Diffusion Guidance
  18. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems
  19. Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research
  20. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems
  21. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning
  22. MOReL: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems
  23. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707
  24. Adam: A method for stochastic optimization. In International Conference on Learning Representations
  25. Offline Reinforcement Learning with Fisher Divergence Critic Regularization
  26. Offline reinforcement learning with implicit Q-learning. In International Conference on Learning Representations
  27. Reward-Conditioned Policies
  28. Conservative Q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems
  29. A Workflow for Offline Model-Free Robotic Reinforcement Learning
  30. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
  31. Solving Quantitative Reasoning Problems with Language Models
  32. Continuous control with deep reinforcement learning
  33. Compositional Visual Generation with Composable Diffusion Models
  34. Understanding Diffusion Models: A Unified Perspective
  35. Walk these ways: Gait-conditioned policies yield diversified quadrupedal agility. In Conference on Robot Learning
  36. Diganta Misra. Mish: A self regularized non-monotonic neural activation function. In British Machine Vision Conference
  37. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
  38. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning
  39. Learning non-convergent non-persistent short-run MCMC toward energy-based model. In Advances in Neural Information Processing Systems
  40. Shaking the foundations: delusions in sequence models for interaction and control
  41. You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments
  42. Zero-shot visual imitation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp.  2050–2053
  43. Imitating Human Behaviour with Diffusion Models
  44. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons
  45. Hierarchical Text-Conditional Image Generation with CLIP Latents
  46. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695
  47. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
  48. Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions
  49. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning
  50. Denoising diffusion implicit models. In International Conference on Learning Representations
  51. Training Agents using Upside-Down Reinforcement Learning
  52. Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning
  53. Reinforcement learning: An introduction. MIT press
  54. Russ Tedrake. Underactuated Robotics. 2022. http://underactuated.mit.edu.

  55. Deep Reinforcement Learning and the Deadly Triad
  56. Attention is all you need. In Advances in Neural Information Processing Systems
  57. Addressing optimism bias in sequence modeling for reinforcement learning. In International Conference on Machine Learning, pp. 22270–22283. PMLR
  58. Don't Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning
  59. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
  60. Behavior Regularized Offline Reinforcement Learning
  61. Group normalization. In European Conference on Computer Vision
  62. Dichotomy of Control: Separating What You Can Control from What You Cannot
  63. mixup: Beyond Empirical Risk Minimization
  64. Online Decision Transformer

Show All 64