Input Perturbation Reduces Exposure Bias in Diffusion Models (2301.11706v3)
Abstract: Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP
- The effects of regularization and data augmentation are class dependent. arXiv:2204.03632, 2022.
- Scheduled sampling for sequence prediction with recurrent neural networks. In NeurIPS, 2015.
- Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
- Vicinal risk minimization. In NIPS, 2000.
- Wavegrad: Estimating gradients for waveform generation. In ICLR, 2021.
- A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv:1707.08819, 2017.
- Diffusion models in vision: A survey. arXiv preprint arXiv:2209.04747, 2022.
- Diffusion models beat GANs on image synthesis. In NeurIPS, 2021.
- Generative adversarial nets. In NeurIPS, 2014.
- Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
- Vector quantized diffusion model for text-to-image synthesis. arXiv:2111.14822, 2021.
- Improved training of wasserstein gans. NeurIPS, 30, 2017.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS, 30, 2017.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Argmax flows and multinomial diffusion: Learning categorical distributions. In NeurIPS, 2021.
- Autoregressive diffusion models. In ICLR, 2022.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401–4410, 2019.
- Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
- On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021.
- Learning multiple layers of features from tiny images. 2009.
- A simple weight decay can improve generalization. NeurIPS, 4, 1991.
- Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
- Learning smooth neural functions via lipschitz regularization. arXiv preprint arXiv:2202.08345, 2022.
- Efficient training of visual transformers with small datasets. NeurIPS, 2021.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Decoupled weight decay regularization. ICLR, 2019.
- Luo, C. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
- Mixed precision training. arXiv:1710.03740, 2017.
- Symbolic music generation with diffusion models. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, 2021.
- Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
- Generating images with sparse representations. arXiv:2103.03841, 2021.
- Improved denoising diffusion probabilistic models. In ICML, 2021.
- GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In ICML, 2022.
- Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32, 2019.
- Learning Transferable Visual Models From Natural Language Supervision. In ICML, 2021.
- Hierarchical text-conditional image generation with CLIP latents. arXiv:2204.06125, 2022.
- Sequence level training with recurrent neural networks. In ICLR, 2016.
- Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In ICML, 2021.
- Self-critical sequence training for image captioning. In CVPR, 2017.
- Contractive auto-encoders: Explicit invariance during feature extraction. In ICML, 2011.
- High-resolution image synthesis with latent diffusion models, 2021.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
- Progressive distillation for fast sampling of diffusion models. In ICLR, 2022.
- Schmidt, F. Generalization in generation: A closer look at exposure bias. In Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP, 2019.
- An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
- Denoising diffusion implicit models. In ICLR, 2021a.
- Generative modeling by estimating gradients of the data distribution. NeurIPS, 32, 2019.
- Score-based generative modeling through stochastic differential equations. In ICLR, 2021b.
- Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
- Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations (ICLR), 2022.
- Poisson flow generative models. arXiv preprint arXiv:2209.11178, 2022.
- Diffusion models: A comprehensive survey of methods and applications. arXiv:2209.00796, 2022.
- Xlnet: Generalized autoregressive pretraining for language understanding. In NeurIPS, 2019.
- Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- mixup: Beyond empirical risk minimization. In ICLR, 2018.