PTQD: Accurate Post-Training Quantization for Diffusion Models

Published 18 May 2023 in cs.CV | (2305.10657v4)

Abstract: Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization (PTQ) of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training. Nonetheless, applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. As the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations. Code is available at https://github.com/ziplab/PTQD.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (66)

View on Semantic Scholar

Summary

The paper proposes PTQD, a framework that mitigates quantization noise via noise disentanglement and mixed-precision techniques for diffusion models.
It achieves up to 19.9x compression with only a marginal 0.06 FID increase, maintaining high-quality image synthesis.
The method accelerates sampling by optimizing bitwidth selection across denoising steps, enabling practical deployment on resource-constrained devices.

PTQD: Accurate Post-Training Quantization for Diffusion Models

The paper "PTQD: Accurate Post-Training Quantization for Diffusion Models" presents a novel framework for quantizing diffusion models post training, addressing significant challenges in computational efficiency without sacrificing image quality. Diffusion models, known for their generative prowess, suffer from high computational demands during inference due to their iterative denoising processes. The proposed method, PTQD (Post-Training Quantization for Diffusion models), seeks to improve the practicality of these models for real-world applications by reducing model size and accelerating sampling.

Core Contributions and Methodology

The primary contribution of this work is a robust framework that manages the quantization noise—typically a byproduct of representing neural network weights and activations in lower-bit formats. Existing quantization techniques lead to notable degradation in sample quality when applied directly to diffusion models. The PTQD framework tackles this by introducing a unified treatment of quantization and diffusion perturbed noise.

Noise Disentanglement: The authors decompose the quantization noise into correlated and uncorrelated components concerning the full-precision model output. This decomposition allows targeted corrections, where the correlated noise is mitigated by estimating a correlation coefficient.
Bias and Variance Correction: For the uncorrelated component, PTQD applies Bias Correction to handle mean deviations and calibrates the variance schedule to absorb additional variance, aligning with the diffusion process's predetermined variance schedule.
Step-aware Mixed Precision: The framework introduces a mixed-precision approach, optimizing the bitwidth selection for different denoising steps. Lower bitwidths accelerate early steps, while higher bitwidths are used in later steps to maintain a high signal-to-noise ratio (SNR).

Experimental Validation and Results

The efficacy of PTQD is demonstrated through extensive experiments on image synthesis tasks using latent diffusion models over benchmarks like ImageNet and LSUN datasets. Quantitative results show that PTQD significantly reduces the computational load, achieving a compression ratio of up to 19.9 times in bit operations while maintaining a marginal increase (0.06) in Fréchet Inception Distance (FID) when compared to full-precision models. The method showcases improvements over existing quantization techniques, notably outperforming them in terms of FID and sFID metrics while using lower precision.

Implications and Future Directions

This research provides a pathway to deploy diffusion models more feasibly on resource-constrained devices, expanding their applicability in industry scenarios requiring real-time processing. The novel approach to noise management within the quantization framework could influence future work across varied domains where computational efficiency is paramount.

The paper's implications extend into theoretical realms, contributing to our understanding of quantization effects within probabilistic generative models. Moving forward, extending PTQD to handle additional model components, or applying it to other generative architectures and tasks, might yield further advancements.

In summary, PTQD represents a significant stride towards efficient deployment of diffusion models, marking an advancement in post-training quantization techniques within AI. By integrating noise disentanglement and mixed precision strategies, it sets a new standard for balancing computational efficiency with model performance.

Markdown Report Issue