Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions (2402.15602v2)
Abstract: We study the asymptotic error of score-based diffusion model sampling in large-sample scenarios from a non-parametric statistics perspective. We show that a kernel-based score estimator achieves an optimal mean square error of $\widetilde{O}\left(n{-1} t{-\frac{d+2}{2}}(t{\frac{d}{2}} \vee 1)\right)$ for the score function of $p_0*\mathcal{N}(0,t\boldsymbol{I}_d)$, where $n$ and $d$ represent the sample size and the dimension, $t$ is bounded above and below by polynomials of $n$, and $p_0$ is an arbitrary sub-Gaussian distribution. As a consequence, this yields an $\widetilde{O}\left(n{-1/2} t{-\frac{d}{4}}\right)$ upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian assumption. If in addition, $p_0$ belongs to the nonparametric family of the $\beta$-Sobolev space with $\beta\le 2$, by adopting an early stopping strategy, we obtain that the diffusion model is nearly (up to log factors) minimax optimal. This removes the crucial lower bound assumption on $p_0$ in previous proofs of the minimax optimality of the diffusion model for nonparametric families.
- Linear convergence bounds for diffusion models via stochastic localization. arXiv preprint arXiv:2308.03686.
- Berens, S. (2013). Conditional rényi entropy. Master’s thesis, Universiteit Leiden. Thesis (M.Sc.) – Universiteit Leiden.
- Generative modeling with denoising auto-encoders and langevin sampling.
- Bobkov, S. G. (2019). Moments of the scores. IEEE Transactions on Information Theory, 65(9):5294–5301.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR.
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations.
- De Bortoli, V. (2022). Convergence of denoising diffusion models under the manifold hypothesis. Transactions on Machine Learning Research.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794.
- Sampling from the sherrington-kirkpatrick gibbs measure via algorithmic stochastic localization. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 323–334. IEEE.
- Folland, G. B. (2005). Higher-order derivatives and taylor’s formula in several variables. Preprint, pages 1–4.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851.
- Logarithmic Sobolev inequalities and stochastic Ising models. J. Stat. Phys., 46(5/6).
- MUDiff: Unified diffusion for complete molecule generation. In The Second Learning on Graphs Conference.
- Prodiff: Progressive fast diffusion model for high-quality text-to-speech. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2595–2605.
- Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4).
- Indritz, J. (1961). An inequality for hermite polynomials. Proceedings of the American Mathematical Society, 12(6):981–983.
- Diff-tts: A denoising diffusion model for text-to-speech. In Interspeech.
- Statistical efficiency of score matching: The view from isoperimetry. In The Eleventh International Conference on Learning Representations.
- Le Gall, J.-F. (2016). Brownian motion, martingales, and stochastic calculus. Springer.
- Convergence for score-based generative modeling with polynomial complexity. Advances in Neural Information Processing Systems, 35:22870–22882.
- Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR.
- Accelerating material design with the generative toolkit for scientific discovery. npj Computational Materials, 9(1):69.
- Montanari, A. (2023). Sampling, diffusions, and stochastic localization. arXiv preprint arXiv:2305.10690.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the 39th International Conference on Machine Learning, pages 16784–16804. PMLR.
- Minimax estimation of smooth densities in wasserstein distance. The Annals of Statistics, 50(3):1519–1540.
- Diffusion models are minimax optimal distribution estimators. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 26517–26582.
- Pidstrigach, J. (2022). Score-based generative models detect manifolds. Advances in Neural Information Processing Systems, 35:35852–35865.
- Brain imaging generation with latent diffusion models. In MICCAI Workshop on Deep Generative Models, pages 117–126. Springer.
- Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pages 8599–8608. PMLR.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3.
- Rényi, A. (1961). On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, volume 4, pages 547–562. University of California Press.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations.
- Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations.
- Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. The annals of Statistics, pages 1348–1360.
- Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. The annals of statistics, pages 1040–1053.
- Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer.
- Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820.
- Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press.
- Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674.
- Optimal score estimation via empirical bayes smoothing. arXiv preprint arXiv:2402.07747.
- Convergence in kl divergence of the inexact langevin algorithm with application to score-based generative models. arXiv preprint arXiv:2211.01512.
- Geometric latent diffusion models for 3d molecule generation. In International Conference on Machine Learning, pages 38592–38610. PMLR.
- Zhang, C.-H. (1997). Empirical bayes and compound estimation of normal means. Statistica Sinica, 7(1):181–193.
- Kaihong Zhang (2 papers)
- Feng Liang (61 papers)
- Jingbo Liu (56 papers)
- Caitlyn H. Yin (1 paper)