Adversarial Diffusion Distillation (2311.17042v1)

Published 28 Nov 2023 in cs.CV

Abstract: We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models. Code and weights available under https://github.com/Stability-AI/generative-models and https://huggingface.co/stabilityai/ .

Citations (215)

View on Semantic Scholar

Summary

The paper introduces ADD, a method that condenses traditional multi-step diffusion processes to as few as 1–4 steps without sacrificing image quality.
The approach leverages a student model, discriminator, and teacher model in a novel training paradigm that fuses adversarial and score distillation losses.
Experiments show that ADD achieves superior compositional performance compared to existing few-step methods, unlocking potential for real-time image synthesis.

Introduction to Adversarial Diffusion Distillation (ADD)

In the rapidly evolving domain of generative modeling, particularly in the context of image synthesis, diffusion models (DMs) have emerged as a powerful technique. These models have notably achieved success in generating high-quality images from text descriptions. However, the iterative nature of traditional DMs often requires multiple processing steps to create an image, limiting their applicability in real-time scenarios.

A Streamlined Approach for Image Generation

The Adversarial Diffusion Distillation (ADD) methodology introduced in this paper aims to condense the image generation process of a pre-trained diffusion model to as few as 1-4 steps without compromising the high quality of the resulting images. The technique combines adversarial loss, which compels the model to produce images indistinguishable from real ones, with score distillation loss. This secondary loss encourages the model to emulate the output of an existing, high-performing diffusion model referred to as the teacher.

Training Dynamics

The ADD process incorporates a student model derived from a pre-trained U-Net diffusion model, a discriminator, and a teacher model. The primary goal is to generate images with fidelity matching those created by traditional multi-step diffusion processes but in a drastically reduced number of steps. The student model is honed via an adversarial loss implemented through a discriminator that discriminates between generated and real images. In parallel, the score distillation loss leverages the pre-trained teacher model, utilizing its knowledge to shape the student's outputs. The ability to refine the generated images through iterative steps is retained for applications where incremental enhancement is desired.

Superior Performance and Real-Time Potential

Analyses have shown that ADD excels beyond existing few-step methods, demonstrating superior image quality and compositional prowess even in single-step generation. When allowed up to four sampling steps, the approach outperforms state-of-the-art diffusion models, paving the way for real-time image synthesis with foundation models. ADD represents a significant breakthrough in generating high-quality images rapidly, potentially unlocking new applications that require instantaneous visual content creation.