Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Elastic Costs to Shape Monge Displacements (2306.11895v2)

Published 20 Jun 2023 in stat.ML and cs.LG

Abstract: Given a source and a target probability measure supported on $\mathbb{R}d$, the Monge problem asks to find the most efficient way to map one distribution to the other. This efficiency is quantified by defining a \textit{cost} function between source and target data. Such a cost is often set by default in the machine learning literature to the squared-Euclidean distance, $\ell2_2(\mathbf{x},\mathbf{y})=\tfrac12|\mathbf{x}-\mathbf{y}|_22$. Recently, Cuturi et. al '23 highlighted the benefits of using elastic costs, defined through a regularizer $\tau$ as $c(\mathbf{x},\mathbf{y})=\ell2_2(\mathbf{x},\mathbf{y})+\tau(\mathbf{x}-\mathbf{y})$. Such costs shape the \textit{displacements} of Monge maps $T$, i.e., the difference between a source point and its image $T(\mathbf{x})-\mathbf{x})$, by giving them a structure that matches that of the proximal operator of $\tau$. In this work, we make two important contributions to the study of elastic costs: (i) For any elastic cost, we propose a numerical method to compute Monge maps that are provably optimal. This provides a much-needed routine to create synthetic problems where the ground truth OT map is known, by analogy to the Brenier theorem, which states that the gradient of any convex potential is always a valid Monge map for the $\ell_22$ cost; (ii) We propose a loss to \textit{learn} the parameter $\theta$ of a parameterized regularizer $\tau_\theta$, and apply it in the case where $\tau_{A}(\mathbf{z})=|A\perp \mathbf{z}|2_2$. This regularizer promotes displacements that lie on a low dimensional subspace of $\mathbb{R}d$, spanned by the $p$ rows of $A\in\mathbb{R}{p\times d}$.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Projection-like retractions on matrix manifolds. SIAM Journal on Optimization, 22(1):135–158, 2012.
  2. The generalized lasso problem and uniqueness. Electronic Journal of Statistics, 13:2307–2347, 2019.
  3. Input Convex Neural Networks. volume 34, 2017.
  4. Convex analysis and monotone operator theory in Hilbert spaces, volume 408. Springer, 2011.
  5. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
  6. Efficient and modular implicit differentiation. arXiv preprint arXiv:2105.15183, 2021.
  7. Sliced and radon wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 51:22–45, 2015.
  8. Nicolas Boumal. An introduction to optimization on smooth manifolds. Cambridge University Press, 2023.
  9. Yann Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Communications on Pure and Applied Mathematics, 44(4), 1991. doi: 10.1002/cpa.3160440402.
  10. Learning single-cell perturbation responses using neural optimal transport. bioRxiv, 2021. doi: 10.1101/2021.12.15.472775.
  11. Supervised training of conditional monge maps. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 6859–6872. Curran Associates, Inc., 2022.
  12. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, 2021.
  13. Joint distribution optimal transportation for domain adaptation. Advances in Neural Information Processing Systems, 30, 2017.
  14. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in neural information processing systems, pages 2292–2300, 2013.
  15. Optimal transport tools (ott): A jax toolbox for all things wasserstein. arXiv preprint arXiv:2201.12324, 2022.
  16. Monge, bregman and occam: Interpretable optimal transport in high-dimensions with feature-sparse maps. In Proceedings of the 40th ICML, 2023.
  17. Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society. Series B (Statistical Methodology), pages 651–676, 2017.
  18. Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. arXiv preprint arXiv:2107.01718, 2021.
  19. An improved central limit theorem and fast convergence rates for entropic transportation costs. arXiv preprint arXiv:2204.09105, 2022.
  20. Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10648–10656, 2019.
  21. Optimal transport map estimation in general function spaces. arXiv preprint arXiv:2212.03722, 2022.
  22. Richard Mansfield Dudley et al. Weak convergence of probabilities on nonseparable metric spaces and empirical measures on euclidean spaces. Illinois Journal of Mathematics, 10(1):109–126, 1966.
  23. The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
  24. Limit theorems for entropic optimal transport maps and the sinkhorn divergence. arXiv preprint arXiv:2207.08683, 2022.
  25. Matrix computations. JHU press, 2013.
  26. A riemannian block coordinate descent method for computing the projection robust wasserstein distance. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 4446–4455. PMLR, 2021.
  27. Minimax estimation of smooth optimal transport maps. The Annals of Statistics, 49(2), 2021.
  28. Multi-subject meg/eeg source imaging with sparse multi-task regression. NeuroImage, 2020.
  29. Generalized sliced wasserstein distances. Advances in neural information processing systems, 32, 2019.
  30. Wasserstein-2 generative networks. 2019.
  31. Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark. 2021.
  32. Tree-sliced variants of wasserstein distances. Advances in neural information processing systems, 32, 2019.
  33. Projection robust wasserstein distance and riemannian optimization. Advances in neural information processing systems, 33:9383–9397, 2020.
  34. On projection robust optimal transport: Sample complexity and model misspecification. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 262–270. PMLR, 13–15 Apr 2021a. URL https://proceedings.mlr.press/v130/lin21a.html.
  35. On projection robust optimal transport: Sample complexity and model misspecification. In International Conference on Artificial Intelligence and Statistics, pages 262–270. PMLR, 2021b.
  36. Optimal transport mapping via input convex neural networks. volume 37, 2020.
  37. Plugin estimation of smooth optimal transport maps. arXiv preprint arXiv:2107.12364, 2021.
  38. Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences, 1781.
  39. Near-optimal estimation of smooth transport maps with kernel sums-of-squares. arXiv preprint arXiv:2112.01907, 2021.
  40. Estimation of wasserstein distances in the spiked transport model. Bernoulli, 28(4):2663–2688, 2022.
  41. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  42. Proximal algorithms. Foundations and trends® in Optimization, 1(3):127–239, 2014.
  43. Subspace robust wasserstein distances. arXiv preprint arXiv:1901.08949, 2019.
  44. Computational optimal transport. Foundations and Trends® in Machine Learning, 11, 2019.
  45. Entropic estimation of optimal transport maps. arXiv preprint arXiv:2109.12004, 2021.
  46. Minimax estimation of discontinuous optimal transport maps: The semi-discrete case. arXiv preprint arXiv:2301.11302, 2023.
  47. Wasserstein barycenter and its application to texture mixing. In Scale Space and Variational Methods in Computer Vision: Third International Conference, pages 435–446. Springer, 2012.
  48. On the sample complexity of entropic optimal transport. arXiv preprint arXiv:2206.13472, 2022.
  49. R Tyrrell Rockafellar. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898, 1976.
  50. Improving GANs using optimal transport. In International Conference on Learning Representations, 2018.
  51. Sinkformers: Transformers with doubly stochastic attention. In International Conference on Artificial Intelligence and Statistics. PMLR, 2022.
  52. Filippo Santambrogio. Optimal transport for applied mathematicians. Springer, 2015.
  53. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019.
  54. Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Statist., 35:876–879, 1964.
  55. Sparse Sinkhorn attention. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research. PMLR, 13–18 Jul 2020.
  56. TrajectoryNet: A dynamic optimal transport network for modeling cellular dynamics. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9526–9536. PMLR, 2020.
  57. Parameter tuning and model selection in optimal transport with semi-dual brenier formulation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  58. Semi-relaxed gromov-wasserstein divergence and applications on graphs. In International Conference on Learning Representations, 2023.
  59. Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance. Bernoulli, 25(4A), 2019.
Citations (3)

Summary

We haven't generated a summary for this paper yet.