Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inference via robust optimal transportation: theory and methods (2301.06297v4)

Published 16 Jan 2023 in math.ST, stat.ML, and stat.TH

Abstract: Optimal transportation theory and the related $p$-Wasserstein distance ($W_p$, $p\geq 1$) are widely-applied in statistics and machine learning. In spite of their popularity, inference based on these tools has some issues. For instance, it is sensitive to outliers and it may not be even defined when the underlying model has infinite moments. To cope with these problems, first we consider a robust version of the primal transportation problem and show that it defines the {robust Wasserstein distance}, $W{(\lambda)}$, depending on a tuning parameter $\lambda > 0$. Second, we illustrate the link between $W_1$ and $W{(\lambda)}$ and study its key measure theoretic aspects. Third, we derive some concentration inequalities for $W{(\lambda)}$. Fourth, we use $W{(\lambda)}$ to define minimum distance estimators, we provide their statistical guarantees and we illustrate how to apply the derived concentration inequalities for a data driven selection of $\lambda$. Fifth, we provide the {dual} form of the robust optimal transportation problem and we apply it to machine learning problems (generative adversarial networks and domain adaptation). Numerical exercises provide evidence of the benefits yielded by our novel methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. {barticle}[author] \bauthor\bsnmAmari, \bfnmShun-ichi\binitsS.-i., \bauthor\bsnmKarakida, \bfnmRyo\binitsR. and \bauthor\bsnmOizumi, \bfnmMasafumi\binitsM. (\byear2018). \btitleInformation geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem. \bjournalInformation Geometry \bvolume1 \bpages13–37. \endbibitem
  2. {binproceedings}[author] \bauthor\bsnmArjovsky, \bfnmMartin\binitsM., \bauthor\bsnmChintala, \bfnmSoumith\binitsS. and \bauthor\bsnmBottou, \bfnmLéon\binitsL. (\byear2017). \btitleWasserstein generative adversarial networks. In \bbooktitleInternational conference on machine learning \bpages214–223. \bpublisherPMLR. \endbibitem
  3. {barticle}[author] \bauthor\bsnmBalaji, \bfnmYogesh\binitsY., \bauthor\bsnmChellappa, \bfnmRama\binitsR. and \bauthor\bsnmFeizi, \bfnmSoheil\binitsS. (\byear2020). \btitleRobust optimal transport with applications in generative modeling and domain adaptation. \bjournalAdvances in Neural Information Processing Systems \bvolume33 \bpages12934–12944. \endbibitem
  4. {barticle}[author] \bauthor\bsnmBassetti, \bfnmFederico\binitsF., \bauthor\bsnmBodini, \bfnmAntonella\binitsA. and \bauthor\bsnmRegazzini, \bfnmEugenio\binitsE. (\byear2006). \btitleOn minimum Kantorovich distance estimators. \bjournalStatistics & probability letters \bvolume76 \bpages1298–1302. \endbibitem
  5. {barticle}[author] \bauthor\bsnmBassetti, \bfnmFederico\binitsF. and \bauthor\bsnmRegazzini, \bfnmEugenio\binitsE. (\byear2006). \btitleAsymptotic properties and robustness of minimum dissimilarity estimators of location-scale parameters. \bjournalTheory of Probability & Its Applications \bvolume50 \bpages171–186. \endbibitem
  6. {bbook}[author] \bauthor\bsnmBasu, \bfnmAyanendranath\binitsA., \bauthor\bsnmShioya, \bfnmHiroyuki\binitsH. and \bauthor\bsnmPark, \bfnmChanseok\binitsC. (\byear2011). \btitleStatistical inference: the minimum distance approach. \bpublisherCRC press. \endbibitem
  7. {barticle}[author] \bauthor\bsnmBoissard, \bfnmEmmanuel\binitsE. and \bauthor\bsnmLe Gouic, \bfnmThibaut\binitsT. (\byear2014). \btitleOn the mean speed of convergence of empirical and occupation measures in Wasserstein distance. \bvolume50 \bpages539–563. \endbibitem
  8. {barticle}[author] \bauthor\bsnmBolley, \bfnmFrançois\binitsF., \bauthor\bsnmGuillin, \bfnmArnaud\binitsA. and \bauthor\bsnmVillani, \bfnmCédric\binitsC. (\byear2007). \btitleQuantitative concentration inequalities for empirical measures on non-compact spaces. \bjournalProbability Theory and Related Fields \bvolume137 \bpages541–593. \endbibitem
  9. {barticle}[author] \bauthor\bsnmBrenier, \bfnmYann\binitsY. (\byear1987). \btitleDécomposition polaire et réarrangement monotone des champs de vecteurs. \bjournalCR Acad. Sci. Paris Sér. I Math. \bvolume305 \bpages805–808. \endbibitem
  10. {binproceedings}[author] \bauthor\bsnmCarriere, \bfnmMathieu\binitsM., \bauthor\bsnmCuturi, \bfnmMarco\binitsM. and \bauthor\bsnmOudot, \bfnmSteve\binitsS. (\byear2017). \btitleSliced Wasserstein kernel for persistence diagrams. In \bbooktitleInternational conference on machine learning \bpages664–673. \bpublisherPMLR. \endbibitem
  11. {binproceedings}[author] \bauthor\bsnmCourty, \bfnmNicolas\binitsN., \bauthor\bsnmFlamary, \bfnmRémi\binitsR. and \bauthor\bsnmTuia, \bfnmDevis\binitsD. (\byear2014). \btitleDomain adaptation with regularized optimal transport. In \bbooktitleJoint European Conference on Machine Learning and Knowledge Discovery in Databases \bpages274–289. \bpublisherSpringer. \endbibitem
  12. {barticle}[author] \bauthor\bsnmCuturi, \bfnmMarco\binitsM. (\byear2013). \btitleSinkhorn distances: lightspeed computation of optimal transport. \bjournalAdvances in neural information processing systems \bvolume26. \endbibitem
  13. {barticle}[author] \bauthor\bsnmDaumé III, \bfnmHal\binitsH. (\byear2009). \btitleFrustratingly easy domain adaptation. \bjournalarXiv preprint arXiv:0907.1815. \endbibitem
  14. {barticle}[author] \bauthor\bparticledel \bsnmBarrio, \bfnmEustasio\binitsE., \bauthor\bsnmSanz, \bfnmAlberto Gonzalez\binitsA. G. and \bauthor\bsnmHallin, \bfnmMarc\binitsM. (\byear2022). \btitleNonparametric Multiple-Output Center-Outward Quantile Regression. \bjournalarXiv preprint arXiv:2204.11756. \endbibitem
  15. {barticle}[author] \bauthor\bsnmDudley, \bfnmRichard Mansfield\binitsR. M. (\byear1969). \btitleThe speed of mean Glivenko-Cantelli convergence. \bjournalThe Annals of Mathematical Statistics \bvolume40 \bpages40–50. \endbibitem
  16. {barticle}[author] \bauthor\bsnmFournier, \bfnmNicolas\binitsN. (\byear2022). \btitleConvergence of the empirical measure in expected Wasserstein distance: non asymptotic explicit bounds in Rd. \bjournalarXiv preprint arXiv:2209.00923. \endbibitem
  17. {barticle}[author] \bauthor\bsnmFournier, \bfnmNicolas\binitsN. and \bauthor\bsnmGuillin, \bfnmArnaud\binitsA. (\byear2015). \btitleOn the rate of convergence in Wasserstein distance of the empirical measure. \bjournalProbability theory and related fields \bvolume162 \bpages707–738. \endbibitem
  18. {binproceedings}[author] \bauthor\bsnmGenevay, \bfnmAude\binitsA., \bauthor\bsnmPeyré, \bfnmGabriel\binitsG. and \bauthor\bsnmCuturi, \bfnmMarco\binitsM. (\byear2018). \btitleLearning generative models with sinkhorn divergences. In \bbooktitleInternational Conference on Artificial Intelligence and Statistics \bpages1608–1617. \bpublisherPMLR. \endbibitem
  19. {barticle}[author] \bauthor\bsnmHallin, \bfnmMarc\binitsM. (\byear2022). \btitleMeasure transportation and statistical decision theory. \bjournalAnnual Review of Statistics and Its Application \bvolume9 \bpages401–424. \endbibitem
  20. {barticle}[author] \bauthor\bsnmHallin, \bfnmMarc\binitsM., \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD. and \bauthor\bsnmLiu, \bfnmHang\binitsH. (\byear2022). \btitleCenter-outward R-estimation for semiparametric VARMA models. \bjournalJournal of the American Statistical Association \bvolume117 \bpages925–938. \endbibitem
  21. {barticle}[author] \bauthor\bsnmHallin, \bfnmMarc\binitsM., \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD. and \bauthor\bsnmLiu, \bfnmHang\binitsH. (\byear2023). \btitleRank-based testing for semiparametric VAR models: a measure transportation approach. \bjournalBernoulli \bvolume29 \bpages229–273. \endbibitem
  22. {bbook}[author] \bauthor\bsnmHayashi, \bfnmFumio\binitsF. (\byear2011). \btitleEconometrics. \bpublisherPrinceton University Press. \endbibitem
  23. {bincollection}[author] \bauthor\bsnmHuber, \bfnmPeter J\binitsP. J. (\byear1992). \btitleRobust estimation of a location parameter. In \bbooktitleBreakthroughs in statistics \bpages492–518. \bpublisherSpringer. \endbibitem
  24. {barticle}[author] \bauthor\bsnmKantorovich, \bfnmLV\binitsL. (\byear1942). \btitleOn the translocation of masses, CR Dokl. \bjournalAcad. Sci. URSS \bvolume37 \bpages191–201. \endbibitem
  25. {barticle}[author] \bauthor\bsnmKitamura, \bfnmYuichi\binitsY. and \bauthor\bsnmStutzer, \bfnmMichael\binitsM. (\byear1997). \btitleAn information-theoretic alternative to generalized method of moments estimation. \bjournalEconometrica: Journal of the Econometric Society \bpages861–874. \endbibitem
  26. {barticle}[author] \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD., \bauthor\bsnmCamponovo, \bfnmLorenzo\binitsL. and \bauthor\bsnmFerrari, \bfnmDavide\binitsD. (\byear2015). \btitleRobust heart rate variability analysis by generalized entropy minimization. \bjournalComputational statistics & data analysis \bvolume82 \bpages137–151. \endbibitem
  27. {barticle}[author] \bauthor\bsnmLa Vecchia, \bfnmDavide\binitsD., \bauthor\bsnmRonchetti, \bfnmElvezio\binitsE. and \bauthor\bsnmTrojani, \bfnmFabio\binitsF. (\byear2012). \btitleHigher-order infinitesimal robustness. \bjournalJournal of the American Statistical Association \bvolume107 \bpages1546–1557. \endbibitem
  28. {barticle}[author] \bauthor\bsnmLecué, \bfnmGuillaume\binitsG. and \bauthor\bsnmLerasle, \bfnmMatthieu\binitsM. (\byear2020). \btitleRobust machine learning by median-of-means: Theory and practice. \bjournalThe Annals of Statistics \bvolume48 \bpages906 – 931. \bdoi10.1214/19-AOS1828 \endbibitem
  29. {barticle}[author] \bauthor\bsnmLei, \bfnmJing\binitsJ. (\byear2020). \btitleConvergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces. \bjournalBernoulli \bvolume26 \bpages767 – 798. \bdoi10.3150/19-BEJ1151 \endbibitem
  30. {barticle}[author] \bauthor\bsnmMancini, \bfnmLoriano\binitsL., \bauthor\bsnmRonchetti, \bfnmElvezio\binitsE. and \bauthor\bsnmTrojani, \bfnmFabio\binitsF. (\byear2005). \btitleOptimal conditionally unbiased bounded-influence inference in dynamic location and scale models. \bjournalJournal of the American Statistical Association \bvolume100 \bpages628–641. \endbibitem
  31. {barticle}[author] \bauthor\bsnmMonge, \bfnmGaspard\binitsG. (\byear1781). \btitleMémoire sur la théorie des déblais et des remblais. \bjournalMem. Math. Phys. Acad. Royale Sci. \bpages666–704. \endbibitem
  32. {bbook}[author] \bauthor\bsnmPanaretos, \bfnmVictor M\binitsV. M. and \bauthor\bsnmZemel, \bfnmYoav\binitsY. (\byear2020). \btitleAn invitation to statistics in Wasserstein space. \bpublisherSpringer Nature. \endbibitem
  33. {barticle}[author] \bauthor\bsnmPeyré, \bfnmGabriel\binitsG. and \bauthor\bsnmCuturi, \bfnmMarco\binitsM. (\byear2019). \btitleComputational optimal transport: With applications to data science. \bjournalFoundations and Trends® in Machine Learning \bvolume11 \bpages355–607. \endbibitem
  34. {barticle}[author] \bauthor\bsnmRamponi, \bfnmAlan\binitsA. and \bauthor\bsnmPlank, \bfnmBarbara\binitsB. (\byear2020). \btitleNeural Unsupervised Domain Adaptation in NLP—A Survey. \bjournalarXiv preprint arXiv:2006.00632. \endbibitem
  35. {bbook}[author] \bauthor\bsnmRockafellar, \bfnmR Tyrrell\binitsR. T. and \bauthor\bsnmWets, \bfnmRoger J-B\binitsR. J.-B. (\byear2009). \btitleVariational analysis \bvolume317. \bpublisherSpringer Science & Business Media. \endbibitem
  36. {binproceedings}[author] \bauthor\bsnmRonchetti, \bfnmE.\binitsE. (\byear2022). \btitleRobustness Aspects of Optimal Transport. \bpublisherDraft Paper. \endbibitem
  37. {bbook}[author] \bauthor\bsnmRonchetti, \bfnmElvezio M\binitsE. M. and \bauthor\bsnmHuber, \bfnmPeter J\binitsP. J. (\byear2009). \btitleRobust statistics. \bpublisherJohn Wiley & Sons. \endbibitem
  38. {bbook}[author] \bauthor\bsnmSamorodnitsky, \bfnmGennady\binitsG. and \bauthor\bsnmTaqqu, \bfnmMurad S\binitsM. S. (\byear2017). \btitleStable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. \bpublisherRoutledge. \endbibitem
  39. {barticle}[author] \bauthor\bsnmSantambrogio, \bfnmFilippo\binitsF. (\byear2015). \btitleOptimal transport for applied mathematicians. \bjournalBirkäuser, NY \bvolume55 \bpages94. \endbibitem
  40. {barticle}[author] \bauthor\bsnmTukey, \bfnmJW\binitsJ. (\byear1977). \btitleExploratory Data Analysis (1970–71: preliminary edition). \bjournalMassachasetts: Addison-Wesley. \endbibitem
  41. {bbook}[author] \bauthor\bparticleVan der \bsnmVaart, \bfnmAad W\binitsA. W. (\byear2000). \btitleAsymptotic statistics \bvolume3. \bpublisherCambridge university press. \endbibitem
  42. {bbook}[author] \bauthor\bsnmVershynin, \bfnmRoman\binitsR. (\byear2018). \btitleHigh-dimensional probability: An introduction with applications in data science \bvolume47. \bpublisherCambridge university press. \endbibitem
  43. {bbook}[author] \bauthor\bsnmVillani, \bfnmCédric\binitsC. \betalet al. (\byear2009). \btitleOptimal transport: old and new \bvolume338. \bpublisherSpringer. \endbibitem
  44. {barticle}[author] \bauthor\bsnmWarwick, \bfnmJane\binitsJ. and \bauthor\bsnmJones, \bfnmMC\binitsM. (\byear2005). \btitleChoosing a robustness tuning parameter. \bjournalJournal of Statistical Computation and Simulation \bvolume75 \bpages581–588. \endbibitem
  45. {barticle}[author] \bauthor\bsnmWhite, \bfnmHalbert\binitsH. (\byear1982). \btitleMaximum likelihood estimation of misspecified models. \bjournalEconometrica: Journal of the econometric society \bpages1–25. \endbibitem
  46. {barticle}[author] \bauthor\bsnmYatracos, \bfnmYannis G\binitsY. G. (\byear2022). \btitleLimitations of the Wasserstein MDE for univariate data. \bjournalStatistics and Computing \bvolume32 \bpages1–11. \endbibitem

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com