Emergent Mind

DeepCache: Accelerating Diffusion Models for Free

(2312.00858)
Published Dec 1, 2023 in cs.CV and cs.AI

Abstract

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs, primarily attributed to the sequential denoising process and cumbersome model size. Traditional methods for compressing diffusion models typically involve extensive retraining, presenting cost and feasibility challenges. In this paper, we introduce DeepCache, a novel training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models, which caches and retrieves features across adjacent denoising stages, thereby curtailing redundant computations. Utilizing the property of the U-Net, we reuse the high-level features while updating the low-level features in a very cheap way. This innovative strategy, in turn, enables a speedup factor of 2.3$\times$ for Stable Diffusion v1.5 with only a 0.05 decline in CLIP Score, and 4.1$\times$ for LDM-4-G with a slight decrease of 0.22 in FID on ImageNet. Our experiments also demonstrate DeepCache's superiority over existing pruning and distillation methods that necessitate retraining and its compatibility with current sampling techniques. Furthermore, we find that under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS. The code is available at https://github.com/horseee/DeepCache

Comparison of feature maps and their similarity in Stable Diffusion's up-sampling block $U_2$.

Overview

  • DeepCache is a new method to increase image synthesis efficiency in diffusion models without extensive retraining or complex architectures.

  • It exploits temporal redundancy by caching high-level features during the denoising process, reducing redundant computations.

  • Leveraging U-Net architecture's skip connections, DeepCache stores high-level features from full computations for reuse in later steps.

  • The non-uniform 1:N strategy of DeepCache selects certain steps for full computation to optimize between image quality and inference speed.

  • Empirical evaluations show DeepCache accelerates diffusion models significantly, maintaining or improving image quality without additional training.

DeepCache is an innovative method introduced to enhance the efficiency of image synthesis using diffusion models. Traditional strategies to compress these models often involve extensive retraining on large datasets or the use of complex model architectures, both of which can be resource-intensive and time-consuming.

To address this, DeepCache was designed to take advantage of the temporal redundancy inherent in the denoising steps of diffusion models. As the process of generating images typically involves adding noise to an image and then sequentially removing it, many of the intermediate steps share similar high-level features. By caching these high-level features, DeepCache can avoid redundant calculations in subsequent steps. This approach leverages the U-Net architecture, a type of neural network that is particularly adept at retaining both low-level and high-level features through skip connections.

In practice, DeepCache operates by performing a full computation on a diffusion model step and then storing the high-level features. For the following denoising steps, only the computations for the necessary low-level features are performed, and the cached high-level features are reused. This process can lead to a significant reduction in computational overhead at each step, effectively speeding up the generation process without additional training.

What sets DeepCache apart is its non-uniform 1:N strategy, where N represents the number of model steps that reuse the cached features. The method selects specific steps for full computation based on their relative importance and similarity to adjacent steps. This design choice addresses the issue of performance degradation associated with larger caching intervals, thereby allowing for a more efficient balance between image quality and inference speed.

Empirical evaluations demonstrated that DeepCache could accelerate diffusion models substantially while maintaining comparable, and sometimes even better, image generation quality when compared to methods that required additional training. Experiments were carried out on several models and datasets, showcasing the method's versatility and effectiveness across different image synthesis tasks.

DeepCache's approach reduces the average model size for each denoising step, thereby accelerating the diffusion process without the need for retraining. Its compatibility with existing fast samplers, combined with its demonstrated image generation quality, suggests that DeepCache is a promising solution for enhancing the speed and practicality of diffusion model-based image synthesis.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube