Emergent Mind

Large-scale Reinforcement Learning for Diffusion Models

(2401.12244)
Published Jan 20, 2024 in cs.CV , cs.AI , and cs.LG

Abstract

Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale text-image training pairs and may inaccurately model aspects of images we care about. This can result in suboptimal samples, model bias, and images that do not align with human ethics and preferences. In this paper, we present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) across a diverse set of reward functions, such as human preference, compositionality, and fairness over millions of images. We illustrate how our approach substantially outperforms existing methods for aligning diffusion models with human preferences. We further illustrate how this substantially improves pretrained Stable Diffusion (SD) models, generating samples that are preferred by humans 80.3% of the time over those from the base SD model while simultaneously improving both the composition and diversity of generated samples.

Progression of sample quality in text-to-image models via multi-task reinforcement learning, reducing bias and stereotypes.

Overview

  • The paper discusses a novel RL framework to optimize diffusion models at large scales, handling millions of prompts and complex reward functions.

  • The proposed method treats the diffusion model's denoising process as a multi-step MDP and uses policy gradients and importance sampling for efficiency.

  • Training includes the original diffusion model objective and normalized rewards to maintain performance on multiple fronts without over-optimization.

  • Experiments show that the RL-enhanced model outperforms the base model, aligning more with human preferences and ethical standards.

  • It offers a solution to the 'alignment tax', potentially guiding the development of more sophisticated and ethically aware AI image generation tools.

Large-scale Reinforcement Learning for Diffusion Models

Background

The capabilities of text-to-image diffusion models have advanced drastically, producing highly creative and photorealistic images. Reinforcement Learning (RL) is increasingly being used to fine-tune these models, extending their performance with respect to various reward functions, such as human aesthetics and ethical considerations. While there have been numerous efforts to enhance diffusion models, many existing methods are limited by the scale of prompts they can handle, or by the specificity of the reward functions they can optimize for.

Method

The paper introduces a novel reinforcement learning framework that can be applied to diffusion models at a large scale, with the ability to optimize over millions of prompts and a diverse set of highly complex reward functions. This RL-based approach treats the iterative denoising process of a diffusion model as a multi-step Markov decision process (MDP), defining policy, action, state, and reward corresponding to each time step. It applies a policy gradient with multi-step MDP, using a likelihood ratio method for policy gradient estimates and importance sampling for efficiency. To ensure new policy iterations don't deviate drastically, a clipped trust region approach is incorporated. The training loss function includes the original diffusion model objective to avoid over-optimization and the final optimization is done with normalized rewards to enhance stability.

Results

Extensive experiments were conducted to validate the effectiveness of the proposed RL framework. These included tests involving human preferences, compositional image generation, and fairness/diversity objectives. The results displayed substantial improvements over the base Stable Diffusion model, with human evaluators expressing greater preference for images generated by the RL-enhanced model. Notably, through multi-task joint training, the modified model was able to balance a diverse set of objectives, achieving significant performance while maintaining a threshold of other essential qualities.

Implications

This work illustrates the potential of RL to fine-tune diffusion models to closely align with intricate human priorities, ethical standards, and composition diversity. The scalability of this framework means that refinements to generative diffusion models can be made across millions of images, allowing for deep reinforcement at a web-scale. Given its success in addressing the 'alignment tax' – where optimization for one metric may detract from performance in others – this approach sets the stage for future endeavors to enhance the fine-tuning of generative models in various applications. The work demonstrates not just a refinement in the generation quality of images, but also provides a pathway toward more ethically aware AI image generation systems.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube