Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Input Perturbation Reduces Exposure Bias in Diffusion Models (2301.11706v3)

Published 27 Jan 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. The effects of regularization and data augmentation are class dependent. arXiv:2204.03632, 2022.
  2. Scheduled sampling for sequence prediction with recurrent neural networks. In NeurIPS, 2015.
  3. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  4. Vicinal risk minimization. In NIPS, 2000.
  5. Wavegrad: Estimating gradients for waveform generation. In ICLR, 2021.
  6. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv:1707.08819, 2017.
  7. Diffusion models in vision: A survey. arXiv preprint arXiv:2209.04747, 2022.
  8. Diffusion models beat GANs on image synthesis. In NeurIPS, 2021.
  9. Generative adversarial nets. In NeurIPS, 2014.
  10. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
  11. Vector quantized diffusion model for text-to-image synthesis. arXiv:2111.14822, 2021.
  12. Improved training of wasserstein gans. NeurIPS, 30, 2017.
  13. Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS, 30, 2017.
  14. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  15. Argmax flows and multinomial diffusion: Learning categorical distributions. In NeurIPS, 2021.
  16. Autoregressive diffusion models. In ICLR, 2022.
  17. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  18. Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
  19. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021.
  20. Learning multiple layers of features from tiny images. 2009.
  21. A simple weight decay can improve generalization. NeurIPS, 4, 1991.
  22. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
  23. Learning smooth neural functions via lipschitz regularization. arXiv preprint arXiv:2202.08345, 2022.
  24. Efficient training of visual transformers with small datasets. NeurIPS, 2021.
  25. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  26. Decoupled weight decay regularization. ICLR, 2019.
  27. Luo, C. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
  28. Mixed precision training. arXiv:1710.03740, 2017.
  29. Symbolic music generation with diffusion models. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, 2021.
  30. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
  31. Generating images with sparse representations. arXiv:2103.03841, 2021.
  32. Improved denoising diffusion probabilistic models. In ICML, 2021.
  33. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In ICML, 2022.
  34. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32, 2019.
  35. Learning Transferable Visual Models From Natural Language Supervision. In ICML, 2021.
  36. Hierarchical text-conditional image generation with CLIP latents. arXiv:2204.06125, 2022.
  37. Sequence level training with recurrent neural networks. In ICLR, 2016.
  38. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In ICML, 2021.
  39. Self-critical sequence training for image captioning. In CVPR, 2017.
  40. Contractive auto-encoders: Explicit invariance during feature extraction. In ICML, 2011.
  41. High-resolution image synthesis with latent diffusion models, 2021.
  42. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  43. Progressive distillation for fast sampling of diffusion models. In ICLR, 2022.
  44. Schmidt, F. Generalization in generation: A closer look at exposure bias. In Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP, 2019.
  45. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965.
  46. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  47. Denoising diffusion implicit models. In ICLR, 2021a.
  48. Generative modeling by estimating gradients of the data distribution. NeurIPS, 32, 2019.
  49. Score-based generative modeling through stochastic differential equations. In ICLR, 2021b.
  50. Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
  51. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations (ICLR), 2022.
  52. Poisson flow generative models. arXiv preprint arXiv:2209.11178, 2022.
  53. Diffusion models: A comprehensive survey of methods and applications. arXiv:2209.00796, 2022.
  54. Xlnet: Generalized autoregressive pretraining for language understanding. In NeurIPS, 2019.
  55. Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
  56. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  57. mixup: Beyond empirical risk minimization. In ICLR, 2018.
Citations (44)

Summary

  • The paper introduces input perturbation to alleviate exposure bias in DDPMs by perturbing ground-truth samples during training.
  • It achieves notable improvements, such as a state-of-the-art FID score of 1.27 on CelebA and a 37.5% reduction in training time.
  • The method requires minimal code changes and shows potential for broader applications in domains like audio and time series.

Analysis of Input Perturbation in Diffusion Models to Mitigate Exposure Bias

The paper "Input Perturbation Reduces Exposure Bias in Diffusion Models" explores an innovative approach to addressing a key limitation of Denoising Diffusion Probabilistic Models (DDPMs), namely exposure bias. DDPMs have gained prominence due to their high-quality sample generation capabilities, but they suffer from computational inefficiencies caused by long sampling chains. This research presents a novel training regularization method aimed at reducing the error accumulation phenomenon akin to exposure bias encountered during the inference phase of these models.

Problem Identification and Proposed Solution

DDPMs consist of a sequence of denoising steps, where samples are generated starting from pure noise and progressively denoised. However, a discrepancy between training and inference phases arises as training uses ground truth samples, while inference relies on previously generated samples. This mismatch leads to cumulative errors over the sampling steps, reminiscent of the exposure bias problem encountered in text generation.

The authors propose a straightforward but effective solution to mitigate this issue by introducing input perturbation during the training phase. Ground truth samples are perturbed with noise to mimic potential inference time errors, explicitly modeling prediction errors and fostering a network that is conditioned to handle these discrepancies. This approach enhances sample quality and reduces both training and inference times without compromising recall or precision.

Empirical Validation and Results

The paper meticulously validates the proposed DDPM with Input Perturbation (DDPM-IP) technique across several benchmarks, achieving noteworthy improvements. For instance, using CelebA 64x64, a state-of-the-art Fréchet Inception Distance (FID) score of 1.27 was attained with a reduction in training time by 37.5%. Importantly, DDPM-IP maintains or enhances performance even with fewer sampling steps than the baseline model, ADM.

The robustness of the proposed methodology is further exemplified on datasets such as CIFAR10, ImageNet 32×32, and FFHQ 128x128. Across these datasets, DDPM-IP demonstrates superior generation quality, with significant FID and sFID score improvements over ADM across various sampling step configurations. This finding is crucial in practical applications where inference speed is a critical factor.

Implications and Future Directions

This paper effectively addresses practical and theoretical issues in generative models, highlighting the potential of input perturbation to resolve training-inference discrepancies. The method's simplicity, requiring minimal code adaptation and no architectural changes, supports ease of integration into existing frameworks, thereby broadening its usability.

Looking ahead, the extension of this approach to domains beyond image generation, such as audio and time series, could be an intriguing avenue. Moreover, investigating the impact of domain-specific training scenarios and hyperparameter tuning on these improvements could yield deeper insights into DDPMs' optimization.

In summary, the introduction of input perturbation as a regularization technique in diffusion models presents a significant stride towards reducing exposure bias, thereby enhancing the performance and efficiency of DDPMs—promising substantial advancements in the quality and applicability of generative models across various domains.