Emergent Mind

Consistency Models

(2303.01469)
Published Mar 2, 2023 in cs.LG , cs.CV , and stat.ML

Abstract

Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.

Single-step denoising process with a consistency model; comparison of noisy and denoised images at various noise levels.

Overview

  • The paper introduces Consistency Models, a novel generative modeling approach offering rapid, one-step data generation while preserving quality.

  • Consistency Models use the Probability Flow ODE for reverse mapping from noise to data distribution, with the option of single or few-step sampling.

  • Two training methodologies for Consistency Models are presented: distillation from pre-trained diffusion models and independent training without adversarial techniques.

  • The models demonstrate superior performance compared to existing techniques, capable of high-quality generation and surpassing some GAN models.

  • They offer extensive zero-shot editing capabilities like inpainting and super-resolution without need for task-specific training.

Introduction to Consistency Models

Diffusion models have markedly improved the landscape of generative modeling, offering remarkable results across imagery, audio, and video synthesis. However, the inherent need for iterative sampling in diffusion models often results in slower generation times which is a significant hurdle for real-time applications. Addressing this limitation, researchers have developed a novel class of generative models called Consistency Models, designed for fast, one-step generation without sacrificing the quality benefits offered by iterative diffusion processes.

Understanding the Underpinnings

Consistency models operate on the principle that they can map a noise distribution directly back to data distribution, much like diffusion models that reverse a process of gradually adding noise to data. The backbone of this reverse mapping is the Probability Flow (PF) ordinary differential equation (ODE), which traditionally requires iterative sampling to generate data from noise.

These models are distinguished by their ability to maintain consistency across points on the same trajectory of the PF ODE. By appropriately learning such mappings, Consistency Models can generate high-quality samples with a single network evaluation or optionally use multistep sampling for improved sample quality, all while maintaining the ability for zero-shot data editing tasks.

Training Consistency Models

To effectively train Consistency Models, two main methodologies are introduced: distillation from pre-trained diffusion models and training models in isolation. Distillation leverages numerical ODE solvers to pair points along the ODE trajectory, whereas standalone training eliminates reliance on pre-trained models, hence positioning them as an independent family of models. This training does not utilize adversarial processes, offering a significant computational advantage.

Experimental benchmarks demonstrate a notable outperformance of existing distillation techniques in both single-step and few-step scenarios. When trained in isolation, these models rival the quality of one-step samples from progressive distillation and surpass many of the existing generative models, even some GAN variants.

Zero-Shot Editing Capabilities

Beyond generation, Consistency Models facilitate a breadth of zero-shot data editing operations including inpainting, colorization, super-resolution, and stroke-guided image editing, all without dedicated training for these tasks. The flexibility in quality and compute trade-offs further enhances their suitability for diverse applications in generative modeling.

Conclusion

Consistency Models emerge as a powerful new paradigm in generative modeling, combining the quality benefits of diffusion-based generative models with the efficiency required for practical, real-time applications. This amalgamation of speed and quality, together with their zero-shot editing prowess, positions them as highly versatile tools in the field of AI-driven content creation.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube