On the representation and learning of monotone triangular transport maps (2009.10303v3)
Abstract: Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport maps$\unicode{x2014}$approximations of the Knothe$\unicode{x2013}$Rosenblatt (KR) rearrangement$\unicode{x2014}$are a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.
- {barticle}[author] \bauthor\bsnmAnderes, \bfnmEthan\binitsE. and \bauthor\bsnmCoram, \bfnmMarc\binitsM. (\byear2012). \btitleA general spline representation for nonparametric and semiparametric density estimates using diffeomorphisms. \bjournalarXiv preprint arXiv:1205.5314. \endbibitem
- {barticle}[author] \bauthor\bsnmBertsekas, \bfnmDimitri P\binitsD. P. (\byear1997). \btitleNonlinear programming. \bjournalJournal of the Operational Research Society \bvolume48 \bpages334–334. \endbibitem
- {btechreport}[author] \bauthor\bsnmBishop, \bfnmChristopher M\binitsC. M. (\byear1994). \btitleMixture density networks \btypeTechnical Report No. \bnumberNeural Computing Research Group report: NCRG/94/004, \bpublisherAston University. \endbibitem
- {barticle}[author] \bauthor\bsnmBogachev, \bfnmVladimir Igorevich\binitsV. I., \bauthor\bsnmKolesnikov, \bfnmAleksandr Viktorovich\binitsA. V. and \bauthor\bsnmMedvedev, \bfnmKirill Vladimirovich\binitsK. V. (\byear2005). \btitleTriangular transformations of measures. \bjournalSbornik: Mathematics \bvolume196 \bpages309. \endbibitem
- {barticle}[author] \bauthor\bsnmBoyd, \bfnmJohn P\binitsJ. P. (\byear1984). \btitleAsymptotic coefficients of Hermite function series. \bjournalJournal of Computational Physics \bvolume54 \bpages382–410. \endbibitem
- {barticle}[author] \bauthor\bsnmChang, \bfnmSeok-Ho\binitsS.-H., \bauthor\bsnmCosman, \bfnmPamela C\binitsP. C. and \bauthor\bsnmMilstein, \bfnmLaurence B\binitsL. B. (\byear2011). \btitleChernoff-type bounds for the Gaussian error function. \bjournalIEEE Transactions on Communications \bvolume59 \bpages2939–2944. \endbibitem
- {barticle}[author] \bauthor\bsnmChkifa, \bfnmAbdellah\binitsA., \bauthor\bsnmCohen, \bfnmAlbert\binitsA. and \bauthor\bsnmSchwab, \bfnmChristoph\binitsC. (\byear2015). \btitleBreaking the curse of dimensionality in sparse polynomial approximation of parametric PDEs. \bjournalJournal de Mathématiques Pures et Appliquées \bvolume103 \bpages400–428. \endbibitem
- {bbook}[author] \bauthor\bsnmCohen, \bfnmAlbert\binitsA. (\byear2003). \btitleNumerical analysis of wavelet methods. \bpublisherElsevier. \endbibitem
- {bincollection}[author] \bauthor\bsnmCohen, \bfnmAlbert\binitsA. and \bauthor\bsnmMigliorati, \bfnmGiovanni\binitsG. (\byear2018). \btitleMultivariate approximation in downward closed polynomial spaces. In \bbooktitleContemporary Computational Mathematics-A celebration of the 80th birthday of Ian Sloan \bpages233–282. \bpublisherSpringer. \endbibitem
- {barticle}[author] \bauthor\bsnmCui, \bfnmTiangang\binitsT. and \bauthor\bsnmDolgov, \bfnmSergey\binitsS. (\byear2021). \btitleDeep composition of tensor trains using squared inverse Rosenblatt transports. \bjournalFoundations of Computational Mathematics \bpages1–60. \endbibitem
- {barticle}[author] \bauthor\bsnmCui, \bfnmTiangang\binitsT., \bauthor\bsnmDolgov, \bfnmSergey\binitsS. and \bauthor\bsnmZahm, \bfnmOlivier\binitsO. (\byear2023). \btitleScalable conditional deep inverse Rosenblatt transports using tensor trains and gradient-based dimension reduction. \bjournalJournal of Computational Physics \bvolume485 \bpages112103. \bdoihttps://doi.org/10.1016/j.jcp.2023.112103 \endbibitem
- {barticle}[author] \bauthor\bsnmCui, \bfnmTiangang\binitsT., \bauthor\bsnmTong, \bfnmXin T\binitsX. T. and \bauthor\bsnmZahm, \bfnmOlivier\binitsO. (\byear2022). \btitlePrior normalization for certified likelihood-informed subspace detection of Bayesian inverse problems. \bjournalInverse Problems \bvolume38 \bpages124002. \endbibitem
- {binproceedings}[author] \bauthor\bsnmDinh, \bfnmLaurent\binitsL., \bauthor\bsnmSohl-Dickstein, \bfnmJascha\binitsJ. and \bauthor\bsnmBengio, \bfnmSamy\binitsS. (\byear2017). \btitleDensity estimation using Real NVP. In \bbooktitleInternational Conference on Learning Representations. \endbibitem
- {barticle}[author] \bauthor\bsnmEl Moselhy, \bfnmTarek A\binitsT. A. and \bauthor\bsnmMarzouk, \bfnmYoussef M\binitsY. M. (\byear2012). \btitleBayesian inference with optimal maps. \bjournalJournal of Computational Physics \bvolume231 \bpages7815–7850. \endbibitem
- {binproceedings}[author] \bauthor\bsnmJaini, \bfnmPriyank\binitsP., \bauthor\bsnmSelby, \bfnmKira A\binitsK. A. and \bauthor\bsnmYu, \bfnmYaoliang\binitsY. (\byear2019). \btitleSum-of-squares polynomial flow. In \bbooktitleInternational Conference on Machine Learning \bpages3009–3018. \endbibitem
- {barticle}[author] \bauthor\bsnmKatzfuss, \bfnmMatthias\binitsM. and \bauthor\bsnmSchäfer, \bfnmFlorian\binitsF. (\byear2023). \btitleScalable Bayesian transport maps for high-dimensional non-Gaussian spatial fields. \bjournalJournal of the American Statistical Association \bvolume0 \bpages1-15. \bdoi10.1080/01621459.2023.2197158 \endbibitem
- {binproceedings}[author] \bauthor\bsnmKingma, \bfnmDurk P\binitsD. P. and \bauthor\bsnmDhariwal, \bfnmPrafulla\binitsP. (\byear2018). \btitleGlow: Generative flow with invertible 1x1 convolutions. In \bbooktitleAdvances in Neural Information Processing Systems \bpages10215–10224. \endbibitem
- {barticle}[author] \bauthor\bsnmKobyzev, \bfnmIvan\binitsI., \bauthor\bsnmPrince, \bfnmSimon\binitsS. and \bauthor\bsnmBrubaker, \bfnmMarcus\binitsM. (\byear2020). \btitleNormalizing flows: An introduction and review of current methods. \bjournalIEEE Transactions on Pattern Analysis and Machine Intelligence. \endbibitem
- {bbook}[author] \bauthor\bsnmKoller, \bfnmDaphne\binitsD. and \bauthor\bsnmFriedman, \bfnmNir\binitsN. (\byear2009). \btitleProbabilistic graphical models: principles and techniques. \bpublisherMIT press. \endbibitem
- {barticle}[author] \bauthor\bsnmKufner, \bfnmAlois\binitsA. and \bauthor\bsnmOpic, \bfnmBohumír\binitsB. (\byear1984). \btitleHow to define reasonably weighted Sobolev spaces. \bjournalCommentationes Mathematicae Universitatis Carolinae \bvolume25 \bpages537–554. \endbibitem
- {barticle}[author] \bauthor\bsnmLezcano Casado, \bfnmMario\binitsM. (\byear2019). \btitleTrivializations for gradient-based optimization on manifolds. \bjournalAdvances in Neural Information Processing Systems \bvolume32 \bpages9157–9168. \endbibitem
- {bmisc}[author] \bauthor\bsnmLichman, \bfnmMoshe\binitsM. (\byear2013). \btitleUCI Machine Learning Repository. \bnotehttp://archive.ics.uci.edu/ml. \endbibitem
- {bbook}[author] \bauthor\bsnmMallat, \bfnmStéphane\binitsS. (\byear1999). \btitleA wavelet tour of signal processing. \bpublisherElsevier. \endbibitem
- {bincollection}[author] \bauthor\bsnmMigliorati, \bfnmGiovanni\binitsG. (\byear2015). \btitleAdaptive polynomial approximation by means of random discrete least squares. In \bbooktitleNumerical Mathematics and Advanced Applications-ENUMATH 2013 \bpages547–554. \bpublisherSpringer. \endbibitem
- {barticle}[author] \bauthor\bsnmMigliorati, \bfnmGiovanni\binitsG. (\byear2019). \btitleAdaptive approximation by optimal weighted least-squares methods. \bjournalSIAM Journal on Numerical Analysis \bvolume57 \bpages2217–2245. \endbibitem
- {binproceedings}[author] \bauthor\bsnmMorrison, \bfnmRebecca\binitsR., \bauthor\bsnmBaptista, \bfnmRicardo\binitsR. and \bauthor\bsnmMarzouk, \bfnmYoussef\binitsY. (\byear2017). \btitleBeyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting. In \bbooktitleAdvances in Neural Information Processing Systems \bpages2359–2369. \endbibitem
- {barticle}[author] \bauthor\bsnmMuckenhoupt, \bfnmBenjamin\binitsB. (\byear1972). \btitleHardy’s inequality with weights. \bjournalStudia Mathematica \bvolume44 \bpages31–38. \endbibitem
- {bbook}[author] \bauthor\bsnmNocedal, \bfnmJorge\binitsJ. and \bauthor\bsnmWright, \bfnmStephen\binitsS. (\byear2006). \btitleNumerical optimization. \bpublisherSpringer Science & Business Media. \endbibitem
- {binproceedings}[author] \bauthor\bsnmPapamakarios, \bfnmGeorge\binitsG. and \bauthor\bsnmMurray, \bfnmIain\binitsI. (\byear2016). \btitleFast ε𝜀\varepsilonitalic_ε-free inference of simulation models with Bayesian conditional density estimation. In \bbooktitleAdvances in Neural Information Processing Systems \bpages1028–1036. \endbibitem
- {binproceedings}[author] \bauthor\bsnmPapamakarios, \bfnmGeorge\binitsG., \bauthor\bsnmPavlakou, \bfnmTheo\binitsT. and \bauthor\bsnmMurray, \bfnmIain\binitsI. (\byear2017). \btitleMasked autoregressive flow for density estimation. In \bbooktitleAdvances in Neural Information Processing Systems \bpages2338–2347. \endbibitem
- {barticle}[author] \bauthor\bsnmParno, \bfnmMatthew D\binitsM. D. and \bauthor\bsnmMarzouk, \bfnmYoussef M\binitsY. M. (\byear2018). \btitleTransport map accelerated Markov chain Monte Carlo. \bjournalSIAM/ASA Journal on Uncertainty Quantification \bvolume6 \bpages645–682. \endbibitem
- {barticle}[author] \bauthor\bsnmRamsay, \bfnmJames O\binitsJ. O. (\byear1998). \btitleEstimating smooth monotone functions. \bjournalJournal of the Royal Statistical Society: Series B (Statistical Methodology) \bvolume60 \bpages365–375. \endbibitem
- {barticle}[author] \bauthor\bsnmRaskutti, \bfnmGarvesh\binitsG. and \bauthor\bsnmUhler, \bfnmCaroline\binitsC. (\byear2018). \btitleLearning directed acyclic graph models based on sparsest permutations. \bjournalStat \bvolume7 \bpagese183. \endbibitem
- {binproceedings}[author] \bauthor\bsnmRezende, \bfnmDanilo\binitsD. and \bauthor\bsnmMohamed, \bfnmShakir\binitsS. (\byear2015). \btitleVariational inference with normalizing flows. In \bbooktitleInternational conference on machine learning \bpages1530–1538. \bpublisherPMLR. \endbibitem
- {barticle}[author] \bauthor\bsnmRosenblatt, \bfnmMurray\binitsM. (\byear1952). \btitleRemarks on a multivariate transformation. \bjournalThe Annals of Mathematical Statistics \bvolume23 \bpages470–472. \endbibitem
- {bbook}[author] \bauthor\bsnmSantambrogio, \bfnmFilippo\binitsF. (\byear2015). \btitleOptimal Transport for Applied Mathematicians. \bpublisherSpringer International Publishing. \endbibitem
- {barticle}[author] \bauthor\bsnmSchäfer, \bfnmFlorian\binitsF., \bauthor\bsnmKatzfuss, \bfnmMatthias\binitsM. and \bauthor\bsnmOwhadi, \bfnmHouman\binitsH. (\byear2021). \btitleSparse Cholesky Factorization by Kullback–Leibler Minimization. \bjournalSIAM Journal on Scientific Computing \bvolume43 \bpagesA2019–A2046. \endbibitem
- {barticle}[author] \bauthor\bsnmSchmuland, \bfnmByron\binitsB. (\byear1992). \btitleDirichlet forms with polynomial domain. \bjournalMath. Japon \bvolume37 \bpages1015–1024. \endbibitem
- {binproceedings}[author] \bauthor\bsnmSchölkopf, \bfnmBernhard\binitsB., \bauthor\bsnmHerbrich, \bfnmRalf\binitsR. and \bauthor\bsnmSmola, \bfnmAlex J\binitsA. J. (\byear2001). \btitleA generalized representer theorem. In \bbooktitleInternational conference on computational learning theory \bpages416–426. \bpublisherSpringer. \endbibitem
- {barticle}[author] \bauthor\bsnmShin, \bfnmYei Eun\binitsY. E., \bauthor\bsnmZhou, \bfnmLan\binitsL. and \bauthor\bsnmDing, \bfnmYu\binitsY. (\byear2022). \btitleJoint estimation of monotone curves via functional principal component analysis. \bjournalComputational Statistics & Data Analysis \bvolume166 \bpages107343. \endbibitem
- {barticle}[author] \bauthor\bsnmSilverman, \bfnmBernard W\binitsB. W. (\byear1982). \btitleOn the estimation of a probability density function by the maximum penalized likelihood method. \bjournalThe Annals of Statistics \bpages795–810. \endbibitem
- {barticle}[author] \bauthor\bsnmSisson, \bfnmScott A\binitsS. A., \bauthor\bsnmFan, \bfnmYanan\binitsY. and \bauthor\bsnmTanaka, \bfnmMark M\binitsM. M. (\byear2007). \btitleSequential Monte Carlo without likelihoods. \bjournalProceedings of the National Academy of Sciences \bvolume104 \bpages1760–1765. \endbibitem
- {barticle}[author] \bauthor\bsnmSpantini, \bfnmAlessio\binitsA., \bauthor\bsnmBaptista, \bfnmRicardo\binitsR. and \bauthor\bsnmMarzouk, \bfnmYoussef\binitsY. (\byear2022). \btitleCoupling techniques for nonlinear ensemble filtering. \bjournalSIAM Review \bvolume64 \bpages921–953. \endbibitem
- {barticle}[author] \bauthor\bsnmSpantini, \bfnmAlessio\binitsA., \bauthor\bsnmBigoni, \bfnmDaniele\binitsD. and \bauthor\bsnmMarzouk, \bfnmYoussef\binitsY. (\byear2018). \btitleInference via low-dimensional couplings. \bjournalThe Journal of Machine Learning Research \bvolume19 \bpages2639–2709. \endbibitem
- {barticle}[author] \bauthor\bsnmTabak, \bfnmEsteban G\binitsE. G. and \bauthor\bsnmTurner, \bfnmCristina V\binitsC. V. (\byear2013). \btitleA family of nonparametric density estimation algorithms. \bjournalCommunications on Pure and Applied Mathematics \bvolume66 \bpages145–164. \endbibitem
- {binproceedings}[author] \bauthor\bsnmTrippe, \bfnmBrian L\binitsB. L. and \bauthor\bsnmTurner, \bfnmRichard E\binitsR. E. (\byear2018). \btitleConditional density estimation with Bayesian normalising flows. In \bbooktitleBayesian Deep Learning: NIPS 2017 Workshop. \endbibitem
- {barticle}[author] \bauthor\bsnmTruong, \bfnmTuyen Trung\binitsT. T. and \bauthor\bsnmNguyen, \bfnmHang-Tuan\binitsH.-T. (\byear2021). \btitleBacktracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments. \bjournalApplied Mathematics & Optimization \bvolume84 \bpages2557–2586. \endbibitem
- {barticle}[author] \bauthor\bsnmUria, \bfnmBenigno\binitsB., \bauthor\bsnmMurray, \bfnmIain\binitsI. and \bauthor\bsnmLarochelle, \bfnmHugo\binitsH. (\byear2013). \btitleRNADE: The real-valued neural autoregressive density-estimator. \bjournalarXiv preprint arXiv:1306.0186. \endbibitem
- {bbook}[author] \bauthor\bsnmVershynin, \bfnmRoman\binitsR. (\byear2018). \btitleHigh-dimensional probability: An introduction with applications in data science \bvolume47. \bpublisherCambridge university press. \endbibitem
- {bbook}[author] \bauthor\bsnmVidakovic, \bfnmBrani\binitsB. (\byear2009). \btitleStatistical modeling by wavelets \bvolume503. \bpublisherJohn Wiley & Sons. \endbibitem
- {bbook}[author] \bauthor\bsnmVillani, \bfnmCédric\binitsC. (\byear2008). \btitleOptimal transport: old and new \bvolume338. \bpublisherSpringer Science & Business Media. \endbibitem
- {barticle}[author] \bauthor\bsnmWang, \bfnmSven\binitsS. and \bauthor\bsnmMarzouk, \bfnmYoussef\binitsY. (\byear2022). \btitleOn minimax density estimation via measure transport. \bjournalarXiv preprint arXiv:2207.10231. \endbibitem
- {bbook}[author] \bauthor\bsnmWasserman, \bfnmLarry\binitsL. (\byear2013). \btitleAll of statistics: a concise course in statistical inference. \bpublisherSpringer Science & Business Media. \endbibitem
- {binproceedings}[author] \bauthor\bsnmWehenkel, \bfnmAntoine\binitsA. and \bauthor\bsnmLouppe, \bfnmGilles\binitsG. (\byear2019). \btitleUnconstrained monotonic neural networks. In \bbooktitleAdvances in Neural Information Processing Systems \bpages1543–1553. \endbibitem
- {barticle}[author] \bauthor\bsnmZech, \bfnmJakob\binitsJ. and \bauthor\bsnmMarzouk, \bfnmYoussef\binitsY. (\byear2022). \btitleSparse approximation of triangular transports. Part II: the infinite dimensional case. \bjournalConstructive Approximation \bvolume55 \bpages987–1036. \endbibitem
- {barticle}[author] \bauthor\bsnmZech, \bfnmJakob\binitsJ. and \bauthor\bsnmMarzouk, \bfnmYoussef\binitsY. (\byear2022). \btitleSparse Approximation of triangular transports. Part I: the finite-dimensional case. \bjournalConstructive Approximation \bvolume55 \bpages919–986. \endbibitem
- Ricardo Baptista (38 papers)
- Youssef Marzouk (75 papers)
- Olivier Zahm (25 papers)