Normalizing flow neural networks by JKO scheme (2212.14424v4)
Abstract: Normalizing flow is a class of deep generative models for efficient sampling and likelihood estimation, which achieves attractive performance, particularly in high dimensions. The flow is often implemented using a sequence of invertible residual blocks. Existing works adopt special network architectures and regularization of flow trajectories. In this paper, we develop a neural ODE flow network called JKO-iFlow, inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which unfolds the discrete-time dynamic of the Wasserstein gradient flow. The proposed method stacks residual blocks one after another, allowing efficient block-wise training of the residual blocks, avoiding sampling SDE trajectories and score matching or variational learning, thus reducing the memory load and difficulty in end-to-end training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the induced trajectory in probability space to improve the model accuracy further. Experiments with synthetic and real data show that the proposed JKO-iFlow network achieves competitive performance compared with existing flow and diffusion models at a significantly reduced computational and memory cost.
- Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3):1–21, 2021.
- Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023.
- Optimizing functionals on the space of probabilities with input convex neural networks. Transactions on Machine Learning Research, 2022. ISSN 2835-8856.
- Input convex neural networks. In International Conference on Machine Learning, pages 146–155. PMLR, 2017.
- On the bootstrap of u and v statistics. The Annals of Statistics, pages 655–674, 1992.
- Size-noise tradeoffs in generative networks. Advances in Neural Information Processing Systems, 31, 2018.
- Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
- Invertible residual networks. In International Conference on Machine Learning, pages 573–582. PMLR, 2019.
- Generative modeling with denoising auto-encoders and langevin sampling. arXiv preprint arXiv:2002.00107, 2020.
- Probability flow solution of the fokker–planck equation. Machine Learning: Science and Technology, 4(3):035012, 2023.
- Convergence to equilibrium in wasserstein distance for fokker–planck equations. Journal of Functional Analysis, 263(8):2430–2457, 2012.
- Proximal optimal transport modeling of population dynamics. In International Conference on Artificial Intelligence and Statistics, pages 6511–6528. PMLR, 2022.
- Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- Residual flows for invertible generative modeling. Advances in Neural Information Processing Systems, 32, 2019.
- Convergence of flow-based generative models via proximal gradient descent in wasserstein space. arXiv preprint arXiv:2310.17582, 2023.
- Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29, 2016.
- Pierre Degond and S Mas-Gallic. The weighted particle method for convection-diffusion equations. i. the case of an isotropic viscosity. Mathematics of computation, 53(188):485–507, 1989.
- A deterministic approximation of diffusion equations using particles. SIAM Journal on Scientific and Statistical Computing, 11(2):293–310, 1990.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- NICE: non-linear independent components estimation. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015.
- Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
- Variational Wasserstein gradient flow. In Proceedings of the 39th International Conference on Machine Learning, pages 6185–6215. PMLR, 2022.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning, pages 3154–3164. PMLR, 2020.
- Generative adversarial nets. In NIPS, 2014.
- Scalable reversible generative models with free-form continuous dynamics. In International Conference on Learning Representations, 2019.
- A kernel two-sample test. J. Mach. Learn. Res., 13:723–773, 2012a.
- Optimal kernel choice for large-scale two-sample tests. Advances in neural information processing systems, 25, 2012b.
- Improved training of wasserstein gans. In NIPS, 2017.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34:22863–22876, 2021.
- Bridging mean-field games and normalizing flows with trajectory regularization. Journal of Computational Physics, page 112155, 2023.
- Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics - Simulation and Computation, 18:1059–1076, 1989.
- Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, 2017.
- A framework of composite functional gradient methods for generative adversarial models. IEEE transactions on pattern analysis and machine intelligence, 43(1):17–32, 2019.
- The variational formulation of the fokker–planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Auto-encoding variational bayes. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
- An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
- Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
- Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11):3964–3979, 2020.
- Wasserstein-2 generative networks. In International Conference on Learning Representations, 2021.
- Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.
- On the ability of neural nets to express distributions. In Conference on Learning Theory, pages 1271–1296. PMLR, 2017.
- Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023.
- Graph normalizing flows. Advances in Neural Information Processing Systems, 32, 2019.
- Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in neural information processing systems, 29, 2016.
- Sliced-wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In International Conference on Machine Learning, pages 4104–4113. PMLR, 2019.
- A universal approximation theorem of deep neural networks for expressing probability distributions. Advances in neural information processing systems, 33:3094–3105, 2020.
- Understanding posterior collapse in generative latent variable models. In DGS@ICLR, 2019.
- Interacting particle solutions of fokker–planck equations through gradient–log–density estimation. Entropy, 22(8):802, 2020.
- Riemannian continuous normalizing flows. Advances in Neural Information Processing Systems, 33:2503–2515, 2020.
- Large-scale wasserstein gradient flows. Advances in Neural Information Processing Systems, 34:15243–15256, 2021.
- Ot-flow: Fast and accurate continuous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 2021.
- Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30, 2017.
- Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Constructive universal high-dimensional distribution generation through deep relu networks. In International Conference on Machine Learning, pages 7610–7619. PMLR, 2020.
- High-dimensional distribution generation through deep neural networks. Partial Differential Equations and Applications, 2(5):1–44, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Generative modeling with optimal transport maps. In International Conference on Learning Representations, 2022.
- A machine learning framework for solving high-dimensional mean field game and mean field control problems. Proceedings of the National Academy of Sciences, 117(17):9183–9193, 2020.
- Improved techniques for training gans. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Mmd aggregated two-sample test. Journal of Machine Learning Research, 24(194):1–81, 2023.
- Self-consistency of the fokker planck equation. In Conference on Learning Theory, pages 817–841. PMLR, 2022.
- Thomas C Sideris. Ordinary differential equations and dynamical systems, volume 2. Springer, 2013.
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
- Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- Charles Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory, volume 6, pages 583–603. University of California Press, 1972.
- Generative models and model criticism via optimized maximum mean discrepancy. In International Conference on Learning Representations, 2017.
- Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.
- Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019a.
- Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pages 3084–3114. PMLR, 2019b.
- Taming hyperparameter tuning in continuous normalizing flows using the jko scheme. Scientific Reports, 13(1):4501, 2023.
- Invertible neural networks for graph prediction. IEEE Journal on Selected Areas in Information Theory, 3(3):454–467, 2022.
- Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
- Monge-Ampére flow for generative modeling. arXiv preprint arXiv:1809.10188, 2018.
- Diffusion normalizing flow. Advances in Neural Information Processing Systems, 34:16280–16291, 2021.
- Chen Xu (186 papers)
- Xiuyuan Cheng (55 papers)
- Yao Xie (164 papers)