Convergence of flow-based generative models via proximal gradient descent in Wasserstein space (2310.17582v3)
Abstract: Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.
- Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023.
- Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023.
- Optimizing functionals on the space of probabilities with input convex neural networks. arXiv preprint arXiv:2106.00774, 2021.
- Shun-ichi Amari. Information geometry and its applications: Convex function and dually flat manifold. In LIX Fall Colloquium on Emerging Trends in Visual Computing, pages 75–102. Springer, 2008.
- Shun-ichi Amari. Information geometry and its applications, volume 194. Springer, 2016.
- Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005.
- Input convex neural networks. In International Conference on Machine Learning, pages 146–155. PMLR, 2017.
- Invertible residual networks. In International Conference on Machine Learning, pages 573–582. PMLR, 2019.
- Linear convergence bounds for diffusion models via stochastic localization. arXiv preprint arXiv:2308.03686, 2023.
- Error bounds for flow matching methods. arXiv preprint arXiv:2305.16860, 2023.
- Espen Bernton. Langevin monte carlo and JKO splitting. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1777–1798. PMLR, 2018.
- Convergence to equilibrium in wasserstein distance for fokker–planck equations. Journal of Functional Analysis, 263(8):2430–2457, 2012.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR, 2023.
- Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- The probability flow ode is provably fast. arXiv preprint arXiv:2305.11798, 2023.
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. arXiv preprint arXiv:2209.11215, 2022.
- Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for ddim-type samplers. In International Conference on Machine Learning, pages 4462–4484. PMLR, 2023.
- Classification logit two-sample testing by neural networks for differentiating near manifold densities. IEEE Transactions on Information Theory, 68(10):6631–6662, 2022.
- Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. arXiv preprint arXiv:2208.05314, 2022.
- Pierre Degond and S Mas-Gallic. The weighted particle method for convection-diffusion equations. i. the case of an isotropic viscosity. Mathematics of computation, 53(188):485–507, 1989.
- A deterministic approximation of diffusion equations using particles. SIAM Journal on Scientific and Statistical Computing, 11(2):293–310, 1990.
- Forward-backward gaussian variational inference via JKO in the Bures-Wasserstein space. In International Conference on Machine Learning, pages 7960–7991. PMLR, 2023.
- NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
- Density estimation using Real NVP. In International Conference on Learning Representations, 2017.
- Variational wasserstein gradient flow. In International Conference on Machine Learning, pages 6185–6215. PMLR, 2022.
- How to train your neural ODE: the world of jacobian and kinetic regularization. In International conference on machine learning, pages 3154–3164. PMLR, 2020.
- Generative adversarial nets. In NIPS, 2014.
- Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2018.
- Improved training of wasserstein gans. In NIPS, 2017.
- On pairs of f𝑓fitalic_f-divergences and their joint range. IEEE Transactions on Information Theory, 57(6):3230–3235, 2011.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Bridging mean-field games and normalizing flows with trajectory regularization. Journal of Computational Physics, page 112155, 2023.
- An error analysis of generative adversarial networks for learning distributions. The Journal of Machine Learning Research, 23(1):5047–5089, 2022.
- Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
- Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, 2017.
- A framework of composite functional gradient methods for generative adversarial models. IEEE transactions on pattern analysis and machine intelligence, 43(1):17–32, 2019.
- The variational formulation of the fokker–planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998.
- Modified frank wolfe in probability space. Advances in Neural Information Processing Systems, 34:14448–14462, 2021.
- Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
- An introduction to Variational Autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
- Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
- Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11):3964–3979, 2020.
- Variational inference via Wasserstein gradient flows. Advances in Neural Information Processing Systems, 35:14434–14447, 2022.
- On the ability of neural nets to express distributions. In Conference on Learning Theory, pages 1271–1296. PMLR, 2017.
- Convergence for score-based generative modeling with polynomial complexity. Advances in Neural Information Processing Systems, 35:22870–22882, 2022.
- Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR, 2023.
- Towards faster non-asymptotic convergence for diffusion-based generative models. arXiv preprint arXiv:2306.09251, 2023.
- Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023.
- Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022.
- A universal approximation theorem of deep neural networks for expressing probability distributions. Advances in neural information processing systems, 33:3094–3105, 2020.
- Distribution learning via neural differential equations: a nonparametric statistical perspective. arXiv preprint arXiv:2309.01043, 2023.
- Large-scale wasserstein gradient flows. Advances in Neural Information Processing Systems, 34:15243–15256, 2021.
- Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
- A. Nemirovsky and D Yudin. Problem complexity and method efficiency in optimization. 1983.
- f-gan: Training generative neural samplers using variational divergence minimization. Advances in neural information processing systems, 29, 2016.
- OT-flow: Fast and accurate continuous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 2021.
- Improved convergence of score-based diffusion models via prediction-correction. arXiv preprint arXiv:2305.14164, 2023.
- High-dimensional distribution generation through deep neural networks. Partial Differential Equations and Applications, 2(5):64, 2021.
- Maxim Raginsky. Strong data processing inequalities and ΦΦ\Phiroman_Φ-sobolev inequalities for discrete channels. IEEE Transactions on Information Theory, 62(6):3355–3389, 2016.
- A machine learning framework for solving high-dimensional mean field game and mean field control problems. Proceedings of the National Academy of Sciences, 117(17):9183–9193, 2020.
- The wasserstein proximal gradient algorithm. Advances in Neural Information Processing Systems, 33:12356–12366, 2020.
- Thomas C. Sideris. Ordinary differential equations and dynamical systems. 2013.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021.
- Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- Neural estimation of statistical divergences. Journal of machine learning research, 23(126), 2022.
- Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pages 3084–3114. PMLR, 2019.
- Taming hyperparameter tuning in continuous normalizing flows using the jko scheme. Scientific Reports, 13(1):4501, 2023.
- Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
- Invertible neural networks for graph prediction. IEEE Journal on Selected Areas in Information Theory, 3(3):454–467, 2022.
- Computing high-dimensional optimal transport by flow neural networks. arXiv preprint arXiv:2305.11857, 2023.
- Normalizing flow neural networks by JKO scheme. Conference on Neural Information Processing Systems (NeurIPS), 2023.
- On the capacity of deep generative networks for approximating distributions. Neural networks, 145:144–154, 2022.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.