Emergent Mind

Training Diffusion Models with Reinforcement Learning

(2305.13301)
Published May 22, 2023 in cs.LG , cs.AI , and cs.CV

Abstract

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for such objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms, which we refer to as denoising diffusion policy optimization (DDPO), that are more effective than alternative reward-weighted likelihood approaches. Empirically, DDPO is able to adapt text-to-image diffusion models to objectives that are difficult to express via prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Finally, we show that DDPO can improve prompt-image alignment using feedback from a vision-language model without the need for additional data collection or human annotation. The project's website can be found at http://rl-diffusion.github.io .

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
References
  1. Is Conditional Generative Modeling all you need for Decision-Making?
  2. Data generation as sequential decision making. Advances in Neural Information Processing Systems, 28
  3. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
  4. Constitutional AI: Harmlessness from AI Feedback
  5. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
  6. Deep reinforcement learning from human preferences. In Neural Information Processing Systems
  7. ImageNet: A large-scale hierarchical image database. In Conference on Computer Vision and Pattern Recognition
  8. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems
  9. Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
  10. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pp. 1329–1338. PMLR
  11. Optimizing DDPM Sampling with Shortcut Fine-Tuning
  12. DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
  13. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  14. Scaling Laws for Reward Model Overoptimization
  15. Multimodal neurons in artificial neural networks. Distill, 2021. https://distill.pub/2021/multimodal-neurons.

  16. IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
  17. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications
  18. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems
  19. Imagen Video: High Definition Video Generation with Diffusion Models
  20. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning
  21. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pp.  267–274
  22. Variational diffusion models. In Neural Information Processing Systems
  23. TAMER: Training an Agent Manually via Evaluative Reinforcement. In International Conference on Development and Learning
  24. Aligning Text-to-Image Models using Human Feedback
  25. Visual instruction tuning. 2023.
  26. Compositional Visual Generation with Composable Diffusion Models
  27. Teaching language models to support answers with verified quotes
  28. Monte carlo gradient estimation in machine learning. The Journal of Machine Learning Research, 21(1):5183–5244
  29. AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
  30. WebGPT: Browser-assisted question-answering with human feedback
  31. Reinforcement learning for bandit neural machine translation with simulated human feedback. In Empirical Methods in Natural Language Processing
  32. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning
  33. Training language models to follow instructions with human feedback
  34. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
  35. Reinforcement learning by reward-weighted regression for operational space control. In International Conference on Machine learning
  36. Learning Transferable Visual Models From Natural Language Supervision
  37. Zero-Shot Text-to-Image Generation
  38. High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition
  39. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  40. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
  41. Imagen Video: High Definition Video Generation with Diffusion Models
  42. Chrisoph Schuhmann. Laion aesthetics, Aug 2022. https://laion.ai/blog/laion-aesthetics/.

  43. Trust region policy optimization. In International Conference on Machine Learning
  44. Proximal Policy Optimization Algorithms
  45. Make-A-Video: Text-to-Video Generation without Text-Video Data
  46. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning
  47. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021. https://openreview.net/forum?id=St1giarCHLP.

  48. Learning to summarize with human feedback. In Neural Information Processing Systems
  49. Policy gradient methods for reinforcement learning with function approximation. In S. Solla, T. Leen, and K. Müller (eds.), Advances in Neural Information Processing Systems, volume 12. MIT Press, 1999. https://proceedings.neurips.cc/paper_files/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf.

  50. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
  51. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning, pp.  5–32
  52. Crystal diffusion variational autoencoder for periodic material generation. In International Conference on Learning Representations
  53. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
  54. GeoDiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations
  55. LION: Latent Point Diffusion Models for 3D Shape Generation
  56. Adding conditional control to text-to-image diffusion models
  57. BERTScore: Evaluating text generation with BERT. In International Conference on Learning Representations
  58. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5826–5835
  59. Fine-Tuning Language Models from Human Preferences

Show All 59