Input Perturbation Reduces Exposure Bias in Diffusion Models (2301.11706v3)

Published 27 Jan 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP

References (57)

Citations (44)

View on Semantic Scholar

Summary

The paper introduces input perturbation to alleviate exposure bias in DDPMs by perturbing ground-truth samples during training.
It achieves notable improvements, such as a state-of-the-art FID score of 1.27 on CelebA and a 37.5% reduction in training time.
The method requires minimal code changes and shows potential for broader applications in domains like audio and time series.

Analysis of Input Perturbation in Diffusion Models to Mitigate Exposure Bias

The paper "Input Perturbation Reduces Exposure Bias in Diffusion Models" explores an innovative approach to addressing a key limitation of Denoising Diffusion Probabilistic Models (DDPMs), namely exposure bias. DDPMs have gained prominence due to their high-quality sample generation capabilities, but they suffer from computational inefficiencies caused by long sampling chains. This research presents a novel training regularization method aimed at reducing the error accumulation phenomenon akin to exposure bias encountered during the inference phase of these models.

Problem Identification and Proposed Solution

DDPMs consist of a sequence of denoising steps, where samples are generated starting from pure noise and progressively denoised. However, a discrepancy between training and inference phases arises as training uses ground truth samples, while inference relies on previously generated samples. This mismatch leads to cumulative errors over the sampling steps, reminiscent of the exposure bias problem encountered in text generation.

The authors propose a straightforward but effective solution to mitigate this issue by introducing input perturbation during the training phase. Ground truth samples are perturbed with noise to mimic potential inference time errors, explicitly modeling prediction errors and fostering a network that is conditioned to handle these discrepancies. This approach enhances sample quality and reduces both training and inference times without compromising recall or precision.

Empirical Validation and Results

The paper meticulously validates the proposed DDPM with Input Perturbation (DDPM-IP) technique across several benchmarks, achieving noteworthy improvements. For instance, using CelebA 64x64, a state-of-the-art Fréchet Inception Distance (FID) score of 1.27 was attained with a reduction in training time by 37.5%. Importantly, DDPM-IP maintains or enhances performance even with fewer sampling steps than the baseline model, ADM.

The robustness of the proposed methodology is further exemplified on datasets such as CIFAR10, ImageNet 32×32, and FFHQ 128x128. Across these datasets, DDPM-IP demonstrates superior generation quality, with significant FID and sFID score improvements over ADM across various sampling step configurations. This finding is crucial in practical applications where inference speed is a critical factor.

Implications and Future Directions

This paper effectively addresses practical and theoretical issues in generative models, highlighting the potential of input perturbation to resolve training-inference discrepancies. The method's simplicity, requiring minimal code adaptation and no architectural changes, supports ease of integration into existing frameworks, thereby broadening its usability.

Looking ahead, the extension of this approach to domains beyond image generation, such as audio and time series, could be an intriguing avenue. Moreover, investigating the impact of domain-specific training scenarios and hyperparameter tuning on these improvements could yield deeper insights into DDPMs' optimization.

In summary, the introduction of input perturbation as a regularization technique in diffusion models presents a significant stride towards reducing exposure bias, thereby enhancing the performance and efficiency of DDPMs—promising substantial advancements in the quality and applicability of generative models across various domains.

PDF Markdown

Related Papers

GitHub

GitHub - forever208/DDPM-IP: [ICML 2023] official implementation for "Input Perturbation Reduces Exposure Bias in Diffusion Models" (91 stars)