Emergent Mind

Adversarial Diffusion Distillation

(2311.17042)
Published Nov 28, 2023 in cs.CV

Abstract

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models. Code and weights available under https://github.com/Stability-AI/generative-models and https://huggingface.co/stabilityai/ .

Overview

  • Introduces Adversarial Diffusion Distillation (ADD) for efficient image generation from pre-trained diffusion models with few steps.

  • Combines adversarial loss and score distillation loss to create high-quality images similar to those from a high-performing teacher model.

  • Utilizes a student model, a discriminator, and a teacher model to reduce generation steps while maintaining image fidelity.

  • Enables image refinement through iterative steps if necessary for applications that benefit from incremental improvement.

  • ADD surpasses existing few-step methods and has potential for real-time high-quality image synthesis.

Introduction to Adversarial Diffusion Distillation (ADD)

In the rapidly evolving domain of generative modeling, particularly in the context of image synthesis, diffusion models (DMs) have emerged as a powerful technique. These models have notably achieved success in generating high-quality images from text descriptions. However, the iterative nature of traditional DMs often requires multiple processing steps to create an image, limiting their applicability in real-time scenarios.

A Streamlined Approach for Image Generation

The Adversarial Diffusion Distillation (ADD) methodology introduced in this paper aims to condense the image generation process of a pre-trained diffusion model to as few as 1-4 steps without compromising the high quality of the resulting images. The technique combines adversarial loss, which compels the model to produce images indistinguishable from real ones, with score distillation loss. This secondary loss encourages the model to emulate the output of an existing, high-performing diffusion model referred to as the teacher.

Training Dynamics

The ADD process incorporates a student model derived from a pre-trained U-Net diffusion model, a discriminator, and a teacher model. The primary goal is to generate images with fidelity matching those created by traditional multi-step diffusion processes but in a drastically reduced number of steps. The student model is honed via an adversarial loss implemented through a discriminator that discriminates between generated and real images. In parallel, the score distillation loss leverages the pre-trained teacher model, utilizing its knowledge to shape the student's outputs. The ability to refine the generated images through iterative steps is retained for applications where incremental enhancement is desired.

Superior Performance and Real-Time Potential

Analyses have shown that ADD excels beyond existing few-step methods, demonstrating superior image quality and compositional prowess even in single-step generation. When allowed up to four sampling steps, the approach outperforms state-of-the-art diffusion models, paving the way for real-time image synthesis with foundation models. ADD represents a significant breakthrough in generating high-quality images rapidly, potentially unlocking new applications that require instantaneous visual content creation.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
Reddit
Adversarial Diffusion Distillation (23 points, 5 comments) in /r/MachineLearning