Distilling ODE Solvers of Diffusion Models into Smaller Steps (2309.16421v2)
Abstract: Abstract Diffusion models have recently gained prominence as a novel category of generative models. Despite their success, these models face a notable drawback in terms of slow sampling speeds, requiring a high number of function evaluations (NFE) in the order of hundreds or thousands. In response, both learning-free and learning-based sampling strategies have been explored to expedite the sampling process. Learning-free sampling employs various ordinary differential equation (ODE) solvers based on the formulation of diffusion ODEs. However, it encounters challenges in faithfully tracking the true sampling trajectory, particularly for small NFE. Conversely, learning-based sampling methods, such as knowledge distillation, demand extensive additional training, limiting their practical applicability. To overcome these limitations, we introduce Distilled-ODE solvers (D-ODE solvers), a straightforward distillation approach grounded in ODE solver formulations. Our method seamlessly integrates the strengths of both learning-free and learning-based sampling. D-ODE solvers are constructed by introducing a single parameter adjustment to existing ODE solvers. Furthermore, we optimize D-ODE solvers with smaller steps using knowledge distillation from ODE solvers with larger steps across a batch of samples. Comprehensive experiments demonstrate the superior performance of D-ODE solvers compared to existing ODE solvers, including DDIM, PNDM, DPM-Solver, DEIS, and EDM, particularly in scenarios with fewer NFE. Notably, our method incurs negligible computational overhead compared to previous distillation techniques, facilitating straightforward and rapid integration with existing samplers. Qualitative analysis reveals that D-ODE solvers not only enhance image quality but also faithfully follow the target ODE trajectory.
- Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
- Dynamic dual-output diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11482–11491, 2022.
- Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2018.
- Learning gradient fields for shape generation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 364–381. Springer, 2020.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Density estimation using real nvp. In International Conference on Learning Representations, 2016.
- Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In The Eleventh International Conference on Learning Representations, 2022.
- Efficient diffusion training via min-snr weighting strategy. arXiv preprint arXiv:2303.09556, 2023.
- Flexible diffusion modeling of long videos. Advances in Neural Information Processing Systems, 35:27953–27965, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Distilling the knowledge in a neural network. stat, 1050:9, 2015.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Exponential integrators. Acta Numerica, 19:209–286, 2010.
- Argmax flows and multinomial diffusion: Towards non-autoregressive language models. arXiv preprint arXiv:2102.05379, 3(4):5, 2021.
- Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
- DP Kingma. Adam: a method for stochastic optimization. In International Conference on Learning Representations, 2014.
- Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Alleviating exposure bias in diffusion models through sampling with shifted time steps. arXiv preprint arXiv:2305.15583, 2023.
- Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations, 2021.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022a.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
- A study on speech enhancement based on diffusion probabilistic model. In 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 659–666. IEEE, 2021.
- Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
- Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
- On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14297–14306, 2023.
- Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091, 2021.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- Input perturbation reduces exposure bias in diffusion models. arXiv preprint arXiv:2301.11706, 2023.
- Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pages 4474–4484. PMLR, 2020.
- Interpreting and improving diffusion models using the euclidean distance function. arXiv preprint arXiv:2306.04848, 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv e-prints, pages arXiv–2204, 2022.
- Sequence level training with recurrent neural networks. In 4th International Conference on Learning Representations, ICLR 2016, 2016.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2021.
- Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020b.
- Consistency models. arXiv preprint arXiv:2303.01469, 2023.
- On the importance of initialization and momentum in deep learning. In International conference on machine learning, pages 1139–1147. PMLR, 2013.
- Digress: Discrete denoising diffusion for graph generation. In Proceedings of the 11th International Conference on Learning Representations, 2023.
- Patch diffusion: Faster and more data-efficient training of diffusion models. arXiv preprint arXiv:2304.12526, 2023.
- Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations, 2021.
- Fast diffusion model. arXiv preprint arXiv:2306.06991, 2023.
- Tackling the generative learning trilemma with denoising diffusion gans. In International Conference on Learning Representations, 2021.
- Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796, 2022a.
- Diffusion probabilistic modeling for video generation. arXiv preprint arXiv:2203.09481, 2022b.
- Lookahead diffusion probabilistic models for refining mean estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1421–1429, 2023.
- Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2022.
- Bias and generalization in deep generative models: An empirical study. Advances in Neural Information Processing Systems, 31, 2018.
- Fast training of diffusion models with masked transformers. arXiv preprint arXiv:2306.09305, 2023.