Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Normalizing flow neural networks by JKO scheme (2212.14424v4)

Published 29 Dec 2022 in stat.ML and cs.LG

Abstract: Normalizing flow is a class of deep generative models for efficient sampling and likelihood estimation, which achieves attractive performance, particularly in high dimensions. The flow is often implemented using a sequence of invertible residual blocks. Existing works adopt special network architectures and regularization of flow trajectories. In this paper, we develop a neural ODE flow network called JKO-iFlow, inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which unfolds the discrete-time dynamic of the Wasserstein gradient flow. The proposed method stacks residual blocks one after another, allowing efficient block-wise training of the residual blocks, avoiding sampling SDE trajectories and score matching or variational learning, thus reducing the memory load and difficulty in end-to-end training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the induced trajectory in probability space to improve the model accuracy further. Experiments with synthetic and real data show that the proposed JKO-iFlow network achieves competitive performance compared with existing flow and diffusion models at a significantly reduced computational and memory cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3):1–21, 2021.
  2. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023.
  3. Optimizing functionals on the space of probabilities with input convex neural networks. Transactions on Machine Learning Research, 2022. ISSN 2835-8856.
  4. Input convex neural networks. In International Conference on Machine Learning, pages 146–155. PMLR, 2017.
  5. On the bootstrap of u and v statistics. The Annals of Statistics, pages 655–674, 1992.
  6. Size-noise tradeoffs in generative networks. Advances in Neural Information Processing Systems, 31, 2018.
  7. Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
  8. Invertible residual networks. In International Conference on Machine Learning, pages 573–582. PMLR, 2019.
  9. Generative modeling with denoising auto-encoders and langevin sampling. arXiv preprint arXiv:2002.00107, 2020.
  10. Probability flow solution of the fokker–planck equation. Machine Learning: Science and Technology, 4(3):035012, 2023.
  11. Convergence to equilibrium in wasserstein distance for fokker–planck equations. Journal of Functional Analysis, 263(8):2430–2457, 2012.
  12. Proximal optimal transport modeling of population dynamics. In International Conference on Artificial Intelligence and Statistics, pages 6511–6528. PMLR, 2022.
  13. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  14. Residual flows for invertible generative modeling. Advances in Neural Information Processing Systems, 32, 2019.
  15. Convergence of flow-based generative models via proximal gradient descent in wasserstein space. arXiv preprint arXiv:2310.17582, 2023.
  16. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29, 2016.
  17. Pierre Degond and S Mas-Gallic. The weighted particle method for convection-diffusion equations. i. the case of an isotropic viscosity. Mathematics of computation, 53(188):485–507, 1989.
  18. A deterministic approximation of diffusion equations using particles. SIAM Journal on Scientific and Statistical Computing, 11(2):293–310, 1990.
  19. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  20. Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  21. NICE: non-linear independent components estimation. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015.
  22. Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
  23. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  24. Variational Wasserstein gradient flow. In Proceedings of the 39th International Conference on Machine Learning, pages 6185–6215. PMLR, 2022.
  25. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  26. How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning, pages 3154–3164. PMLR, 2020.
  27. Generative adversarial nets. In NIPS, 2014.
  28. Scalable reversible generative models with free-form continuous dynamics. In International Conference on Learning Representations, 2019.
  29. A kernel two-sample test. J. Mach. Learn. Res., 13:723–773, 2012a.
  30. Optimal kernel choice for large-scale two-sample tests. Advances in neural information processing systems, 25, 2012b.
  31. Improved training of wasserstein gans. In NIPS, 2017.
  32. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  33. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  34. A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34:22863–22876, 2021.
  35. Bridging mean-field games and normalizing flows with trajectory regularization. Journal of Computational Physics, page 112155, 2023.
  36. Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics - Simulation and Computation, 18:1059–1076, 1989.
  37. Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, 2017.
  38. A framework of composite functional gradient methods for generative adversarial models. IEEE transactions on pattern analysis and machine intelligence, 43(1):17–32, 2019.
  39. The variational formulation of the fokker–planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998.
  40. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  41. Auto-encoding variational bayes. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
  42. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
  43. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
  44. Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11):3964–3979, 2020.
  45. Wasserstein-2 generative networks. In International Conference on Learning Representations, 2021.
  46. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.
  47. On the ability of neural nets to express distributions. In Conference on Learning Theory, pages 1271–1296. PMLR, 2017.
  48. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023.
  49. Graph normalizing flows. Advances in Neural Information Processing Systems, 32, 2019.
  50. Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in neural information processing systems, 29, 2016.
  51. Sliced-wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In International Conference on Machine Learning, pages 4104–4113. PMLR, 2019.
  52. A universal approximation theorem of deep neural networks for expressing probability distributions. Advances in neural information processing systems, 33:3094–3105, 2020.
  53. Understanding posterior collapse in generative latent variable models. In DGS@ICLR, 2019.
  54. Interacting particle solutions of fokker–planck equations through gradient–log–density estimation. Entropy, 22(8):802, 2020.
  55. Riemannian continuous normalizing flows. Advances in Neural Information Processing Systems, 33:2503–2515, 2020.
  56. Large-scale wasserstein gradient flows. Advances in Neural Information Processing Systems, 34:15243–15256, 2021.
  57. Ot-flow: Fast and accurate continuous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 2021.
  58. Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30, 2017.
  59. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  60. Constructive universal high-dimensional distribution generation through deep relu networks. In International Conference on Machine Learning, pages 7610–7619. PMLR, 2020.
  61. High-dimensional distribution generation through deep neural networks. Partial Differential Equations and Applications, 2(5):1–44, 2021.
  62. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  63. Generative modeling with optimal transport maps. In International Conference on Learning Representations, 2022.
  64. A machine learning framework for solving high-dimensional mean field game and mean field control problems. Proceedings of the National Academy of Sciences, 117(17):9183–9193, 2020.
  65. Improved techniques for training gans. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
  66. Mmd aggregated two-sample test. Journal of Machine Learning Research, 24(194):1–81, 2023.
  67. Self-consistency of the fokker planck equation. In Conference on Learning Theory, pages 817–841. PMLR, 2022.
  68. Thomas C Sideris. Ordinary differential equations and dynamical systems, volume 2. Springer, 2013.
  69. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  70. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
  71. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  72. Charles Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory, volume 6, pages 583–603. University of California Press, 1972.
  73. Generative models and model criticism via optimized maximum mean discrepancy. In International Conference on Learning Representations, 2017.
  74. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.
  75. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019a.
  76. Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pages 3084–3114. PMLR, 2019b.
  77. Taming hyperparameter tuning in continuous normalizing flows using the jko scheme. Scientific Reports, 13(1):4501, 2023.
  78. Invertible neural networks for graph prediction. IEEE Journal on Selected Areas in Information Theory, 3(3):454–467, 2022.
  79. Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
  80. Monge-Ampére flow for generative modeling. arXiv preprint arXiv:1809.10188, 2018.
  81. Diffusion normalizing flow. Advances in Neural Information Processing Systems, 34:16280–16291, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chen Xu (186 papers)
  2. Xiuyuan Cheng (55 papers)
  3. Yao Xie (164 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com