Inference via robust optimal transportation: theory and methods (2301.06297v4)
Abstract: Optimal transportation theory and the related $p$-Wasserstein distance ($W_p$, $p\geq 1$) are widely-applied in statistics and machine learning. In spite of their popularity, inference based on these tools has some issues. For instance, it is sensitive to outliers and it may not be even defined when the underlying model has infinite moments. To cope with these problems, first we consider a robust version of the primal transportation problem and show that it defines the {robust Wasserstein distance}, $W{(\lambda)}$, depending on a tuning parameter $\lambda > 0$. Second, we illustrate the link between $W_1$ and $W{(\lambda)}$ and study its key measure theoretic aspects. Third, we derive some concentration inequalities for $W{(\lambda)}$. Fourth, we use $W{(\lambda)}$ to define minimum distance estimators, we provide their statistical guarantees and we illustrate how to apply the derived concentration inequalities for a data driven selection of $\lambda$. Fifth, we provide the {dual} form of the robust optimal transportation problem and we apply it to machine learning problems (generative adversarial networks and domain adaptation). Numerical exercises provide evidence of the benefits yielded by our novel methods.
- {barticle}[author] \bauthor\bsnmAmari, \bfnmShun-ichi\binitsS.-i., \bauthor\bsnmKarakida, \bfnmRyo\binitsR. and \bauthor\bsnmOizumi, \bfnmMasafumi\binitsM. (\byear2018). \btitleInformation geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem. \bjournalInformation Geometry \bvolume1 \bpages13–37. \endbibitem
- {binproceedings}[author] \bauthor\bsnmArjovsky, \bfnmMartin\binitsM., \bauthor\bsnmChintala, \bfnmSoumith\binitsS. and \bauthor\bsnmBottou, \bfnmLéon\binitsL. (\byear2017). \btitleWasserstein generative adversarial networks. In \bbooktitleInternational conference on machine learning \bpages214–223. \bpublisherPMLR. \endbibitem
- {barticle}[author] \bauthor\bsnmBalaji, \bfnmYogesh\binitsY., \bauthor\bsnmChellappa, \bfnmRama\binitsR. and \bauthor\bsnmFeizi, \bfnmSoheil\binitsS. (\byear2020). \btitleRobust optimal transport with applications in generative modeling and domain adaptation. \bjournalAdvances in Neural Information Processing Systems \bvolume33 \bpages12934–12944. \endbibitem
- {barticle}[author] \bauthor\bsnmBassetti, \bfnmFederico\binitsF., \bauthor\bsnmBodini, \bfnmAntonella\binitsA. and \bauthor\bsnmRegazzini, \bfnmEugenio\binitsE. (\byear2006). \btitleOn minimum Kantorovich distance estimators. \bjournalStatistics & probability letters \bvolume76 \bpages1298–1302. \endbibitem
- {barticle}[author] \bauthor\bsnmBassetti, \bfnmFederico\binitsF. and \bauthor\bsnmRegazzini, \bfnmEugenio\binitsE. (\byear2006). \btitleAsymptotic properties and robustness of minimum dissimilarity estimators of location-scale parameters. \bjournalTheory of Probability & Its Applications \bvolume50 \bpages171–186. \endbibitem
- {bbook}[author] \bauthor\bsnmBasu, \bfnmAyanendranath\binitsA., \bauthor\bsnmShioya, \bfnmHiroyuki\binitsH. and \bauthor\bsnmPark, \bfnmChanseok\binitsC. (\byear2011). \btitleStatistical inference: the minimum distance approach. \bpublisherCRC press. \endbibitem
- {barticle}[author] \bauthor\bsnmBoissard, \bfnmEmmanuel\binitsE. and \bauthor\bsnmLe Gouic, \bfnmThibaut\binitsT. (\byear2014). \btitleOn the mean speed of convergence of empirical and occupation measures in Wasserstein distance. \bvolume50 \bpages539–563. \endbibitem
- {barticle}[author] \bauthor\bsnmBolley, \bfnmFrançois\binitsF., \bauthor\bsnmGuillin, \bfnmArnaud\binitsA. and \bauthor\bsnmVillani, \bfnmCédric\binitsC. (\byear2007). \btitleQuantitative concentration inequalities for empirical measures on non-compact spaces. \bjournalProbability Theory and Related Fields \bvolume137 \bpages541–593. \endbibitem
- {barticle}[author] \bauthor\bsnmBrenier, \bfnmYann\binitsY. (\byear1987). \btitleDécomposition polaire et réarrangement monotone des champs de vecteurs. \bjournalCR Acad. Sci. Paris Sér. I Math. \bvolume305 \bpages805–808. \endbibitem
- {binproceedings}[author] \bauthor\bsnmCarriere, \bfnmMathieu\binitsM., \bauthor\bsnmCuturi, \bfnmMarco\binitsM. and \bauthor\bsnmOudot, \bfnmSteve\binitsS. (\byear2017). \btitleSliced Wasserstein kernel for persistence diagrams. In \bbooktitleInternational conference on machine learning \bpages664–673. \bpublisherPMLR. \endbibitem
- {binproceedings}[author] \bauthor\bsnmCourty, \bfnmNicolas\binitsN., \bauthor\bsnmFlamary, \bfnmRémi\binitsR. and \bauthor\bsnmTuia, \bfnmDevis\binitsD. (\byear2014). \btitleDomain adaptation with regularized optimal transport. In \bbooktitleJoint European Conference on Machine Learning and Knowledge Discovery in Databases \bpages274–289. \bpublisherSpringer. \endbibitem
- {barticle}[author] \bauthor\bsnmCuturi, \bfnmMarco\binitsM. (\byear2013). \btitleSinkhorn distances: lightspeed computation of optimal transport. \bjournalAdvances in neural information processing systems \bvolume26. \endbibitem
- {barticle}[author] \bauthor\bsnmDaumé III, \bfnmHal\binitsH. (\byear2009). \btitleFrustratingly easy domain adaptation. \bjournalarXiv preprint arXiv:0907.1815. \endbibitem
- {barticle}[author] \bauthor\bparticledel \bsnmBarrio, \bfnmEustasio\binitsE., \bauthor\bsnmSanz, \bfnmAlberto Gonzalez\binitsA. G. and \bauthor\bsnmHallin, \bfnmMarc\binitsM. (\byear2022). \btitleNonparametric Multiple-Output Center-Outward Quantile Regression. \bjournalarXiv preprint arXiv:2204.11756. \endbibitem
- {barticle}[author] \bauthor\bsnmDudley, \bfnmRichard Mansfield\binitsR. M. (\byear1969). \btitleThe speed of mean Glivenko-Cantelli convergence. \bjournalThe Annals of Mathematical Statistics \bvolume40 \bpages40–50. \endbibitem
- {barticle}[author] \bauthor\bsnmFournier, \bfnmNicolas\binitsN. (\byear2022). \btitleConvergence of the empirical measure in expected Wasserstein distance: non asymptotic explicit bounds in Rd. \bjournalarXiv preprint arXiv:2209.00923. \endbibitem
- {barticle}[author] \bauthor\bsnmFournier, \bfnmNicolas\binitsN. and \bauthor\bsnmGuillin, \bfnmArnaud\binitsA. (\byear2015). \btitleOn the rate of convergence in Wasserstein distance of the empirical measure. \bjournalProbability theory and related fields \bvolume162 \bpages707–738. \endbibitem
- {binproceedings}[author] \bauthor\bsnmGenevay, \bfnmAude\binitsA., \bauthor\bsnmPeyré, \bfnmGabriel\binitsG. and \bauthor\bsnmCuturi, \bfnmMarco\binitsM. (\byear2018). \btitleLearning generative models with sinkhorn divergences. In \bbooktitleInternational Conference on Artificial Intelligence and Statistics \bpages1608–1617. \bpublisherPMLR. \endbibitem
- {barticle}[author] \bauthor\bsnmHallin, \bfnmMarc\binitsM. (\byear2022). \btitleMeasure transportation and statistical decision theory. \bjournalAnnual Review of Statistics and Its Application \bvolume9 \bpages401–424. \endbibitem
- {barticle}[author] \bauthor\bsnmHallin, \bfnmMarc\binitsM., \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD. and \bauthor\bsnmLiu, \bfnmHang\binitsH. (\byear2022). \btitleCenter-outward R-estimation for semiparametric VARMA models. \bjournalJournal of the American Statistical Association \bvolume117 \bpages925–938. \endbibitem
- {barticle}[author] \bauthor\bsnmHallin, \bfnmMarc\binitsM., \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD. and \bauthor\bsnmLiu, \bfnmHang\binitsH. (\byear2023). \btitleRank-based testing for semiparametric VAR models: a measure transportation approach. \bjournalBernoulli \bvolume29 \bpages229–273. \endbibitem
- {bbook}[author] \bauthor\bsnmHayashi, \bfnmFumio\binitsF. (\byear2011). \btitleEconometrics. \bpublisherPrinceton University Press. \endbibitem
- {bincollection}[author] \bauthor\bsnmHuber, \bfnmPeter J\binitsP. J. (\byear1992). \btitleRobust estimation of a location parameter. In \bbooktitleBreakthroughs in statistics \bpages492–518. \bpublisherSpringer. \endbibitem
- {barticle}[author] \bauthor\bsnmKantorovich, \bfnmLV\binitsL. (\byear1942). \btitleOn the translocation of masses, CR Dokl. \bjournalAcad. Sci. URSS \bvolume37 \bpages191–201. \endbibitem
- {barticle}[author] \bauthor\bsnmKitamura, \bfnmYuichi\binitsY. and \bauthor\bsnmStutzer, \bfnmMichael\binitsM. (\byear1997). \btitleAn information-theoretic alternative to generalized method of moments estimation. \bjournalEconometrica: Journal of the Econometric Society \bpages861–874. \endbibitem
- {barticle}[author] \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD., \bauthor\bsnmCamponovo, \bfnmLorenzo\binitsL. and \bauthor\bsnmFerrari, \bfnmDavide\binitsD. (\byear2015). \btitleRobust heart rate variability analysis by generalized entropy minimization. \bjournalComputational statistics & data analysis \bvolume82 \bpages137–151. \endbibitem
- {barticle}[author] \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD., \bauthor\bsnmRonchetti, \bfnmElvezio\binitsE. and \bauthor\bsnmTrojani, \bfnmFabio\binitsF. (\byear2012). \btitleHigher-order infinitesimal robustness. \bjournalJournal of the American Statistical Association \bvolume107 \bpages1546–1557. \endbibitem
- {barticle}[author] \bauthor\bsnmLecué, \bfnmGuillaume\binitsG. and \bauthor\bsnmLerasle, \bfnmMatthieu\binitsM. (\byear2020). \btitleRobust machine learning by median-of-means: Theory and practice. \bjournalThe Annals of Statistics \bvolume48 \bpages906 – 931. \bdoi10.1214/19-AOS1828 \endbibitem
- {barticle}[author] \bauthor\bsnmLei, \bfnmJing\binitsJ. (\byear2020). \btitleConvergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces. \bjournalBernoulli \bvolume26 \bpages767 – 798. \bdoi10.3150/19-BEJ1151 \endbibitem
- {barticle}[author] \bauthor\bsnmMancini, \bfnmLoriano\binitsL., \bauthor\bsnmRonchetti, \bfnmElvezio\binitsE. and \bauthor\bsnmTrojani, \bfnmFabio\binitsF. (\byear2005). \btitleOptimal conditionally unbiased bounded-influence inference in dynamic location and scale models. \bjournalJournal of the American Statistical Association \bvolume100 \bpages628–641. \endbibitem
- {barticle}[author] \bauthor\bsnmMonge, \bfnmGaspard\binitsG. (\byear1781). \btitleMémoire sur la théorie des déblais et des remblais. \bjournalMem. Math. Phys. Acad. Royale Sci. \bpages666–704. \endbibitem
- {bbook}[author] \bauthor\bsnmPanaretos, \bfnmVictor M\binitsV. M. and \bauthor\bsnmZemel, \bfnmYoav\binitsY. (\byear2020). \btitleAn invitation to statistics in Wasserstein space. \bpublisherSpringer Nature. \endbibitem
- {barticle}[author] \bauthor\bsnmPeyré, \bfnmGabriel\binitsG. and \bauthor\bsnmCuturi, \bfnmMarco\binitsM. (\byear2019). \btitleComputational optimal transport: With applications to data science. \bjournalFoundations and Trends® in Machine Learning \bvolume11 \bpages355–607. \endbibitem
- {barticle}[author] \bauthor\bsnmRamponi, \bfnmAlan\binitsA. and \bauthor\bsnmPlank, \bfnmBarbara\binitsB. (\byear2020). \btitleNeural Unsupervised Domain Adaptation in NLP—A Survey. \bjournalarXiv preprint arXiv:2006.00632. \endbibitem
- {bbook}[author] \bauthor\bsnmRockafellar, \bfnmR Tyrrell\binitsR. T. and \bauthor\bsnmWets, \bfnmRoger J-B\binitsR. J.-B. (\byear2009). \btitleVariational analysis \bvolume317. \bpublisherSpringer Science & Business Media. \endbibitem
- {binproceedings}[author] \bauthor\bsnmRonchetti, \bfnmE.\binitsE. (\byear2022). \btitleRobustness Aspects of Optimal Transport. \bpublisherDraft Paper. \endbibitem
- {bbook}[author] \bauthor\bsnmRonchetti, \bfnmElvezio M\binitsE. M. and \bauthor\bsnmHuber, \bfnmPeter J\binitsP. J. (\byear2009). \btitleRobust statistics. \bpublisherJohn Wiley & Sons. \endbibitem
- {bbook}[author] \bauthor\bsnmSamorodnitsky, \bfnmGennady\binitsG. and \bauthor\bsnmTaqqu, \bfnmMurad S\binitsM. S. (\byear2017). \btitleStable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. \bpublisherRoutledge. \endbibitem
- {barticle}[author] \bauthor\bsnmSantambrogio, \bfnmFilippo\binitsF. (\byear2015). \btitleOptimal transport for applied mathematicians. \bjournalBirkäuser, NY \bvolume55 \bpages94. \endbibitem
- {barticle}[author] \bauthor\bsnmTukey, \bfnmJW\binitsJ. (\byear1977). \btitleExploratory Data Analysis (1970–71: preliminary edition). \bjournalMassachasetts: Addison-Wesley. \endbibitem
- {bbook}[author] \bauthor\bparticleVan der \bsnmVaart, \bfnmAad W\binitsA. W. (\byear2000). \btitleAsymptotic statistics \bvolume3. \bpublisherCambridge university press. \endbibitem
- {bbook}[author] \bauthor\bsnmVershynin, \bfnmRoman\binitsR. (\byear2018). \btitleHigh-dimensional probability: An introduction with applications in data science \bvolume47. \bpublisherCambridge university press. \endbibitem
- {bbook}[author] \bauthor\bsnmVillani, \bfnmCédric\binitsC. \betalet al. (\byear2009). \btitleOptimal transport: old and new \bvolume338. \bpublisherSpringer. \endbibitem
- {barticle}[author] \bauthor\bsnmWarwick, \bfnmJane\binitsJ. and \bauthor\bsnmJones, \bfnmMC\binitsM. (\byear2005). \btitleChoosing a robustness tuning parameter. \bjournalJournal of Statistical Computation and Simulation \bvolume75 \bpages581–588. \endbibitem
- {barticle}[author] \bauthor\bsnmWhite, \bfnmHalbert\binitsH. (\byear1982). \btitleMaximum likelihood estimation of misspecified models. \bjournalEconometrica: Journal of the econometric society \bpages1–25. \endbibitem
- {barticle}[author] \bauthor\bsnmYatracos, \bfnmYannis G\binitsY. G. (\byear2022). \btitleLimitations of the Wasserstein MDE for univariate data. \bjournalStatistics and Computing \bvolume32 \bpages1–11. \endbibitem