Sample Complexity of Sinkhorn divergences (1810.02733v2)

Published 5 Oct 2018 in math.ST and stat.TH

Abstract: Optimal transport (OT) and maximum mean discrepancies (MMD) are now routinely used in machine learning to compare probability measures. We focus in this paper on \emph{Sinkhorn divergences} (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength $\varepsilon$, between OT ($\varepsilon=0$) and MMD ($\varepsilon=\infty$). Although the tradeoff induced by that regularization is now well understood computationally (OT, SDs and MMD require respectively $O(n^3\log n)$, $O(n^2)$ and $n^2$ operations given a sample size $n$), much less is known in terms of their \emph{sample complexity}, namely the gap between these quantities, when evaluated using finite samples \emph{vs.} their respective densities. Indeed, while the sample complexity of OT and MMD stand at two extremes, $1/n^{1/d}$ for OT in dimension $d$ and $1/\sqrt{n}$ for MMD, that for SDs has only been studied empirically. In this paper, we \emph{(i)} derive a bound on the approximation error made with SDs when approximating OT as a function of the regularizer $\varepsilon$, \emph{(ii)} prove that the optimizers of regularized OT are bounded in a Sobolev (RKHS) ball independent of the two measures and \emph{(iii)} provide the first sample complexity bound for SDs, obtained,by reformulating SDs as a maximization problem in a RKHS. We thus obtain a scaling in $1/\sqrt{n}$ (as in MMD), with a constant that depends however on $\varepsilon$, making the bridge between OT and MMD complete.

Citations (265)

View on Semantic Scholar

Summary

The paper derives an approximation error bound for regularized OT, showing how the parameter ε controls the interpolation between OT and MMD.
The study proves the first sample complexity bound for Sinkhorn divergences with a convergence rate of 1/√n, dependent on ε.
It demonstrates that regularized OT optimizers reside in a Sobolev (RKHS) ball, justifying kernel-based methods like SGD for efficient computation.

Sample Complexity of Sinkhorn Divergences

The paper, "Sample Complexity of Sinkhorn Divergences," provides a detailed and technical examination of the Sinkhorn divergences (SDs) in relation to Optimal Transport (OT) and Maximum Mean Discrepancies (MMD) frameworks frequently used in machine learning for comparing probability measures. The authors focus on the sample complexity of SDs, particularly in estimating the gap between these quantities when evaluated using finite samples as opposed to their respective densities.

Key Contributions

Approximation Error Bound: The paper derives a bound on the approximation error of OT by SDs as a function of the regularization parameter $\varepsilon$ . This provides crucial insights into the speed of convergence of regularized OT to standard OT for continuous measures. It explicitly highlights that the regularization parameter $\varepsilon$ controls the approximation quality, interpolating between OT and MMD.
Sample Complexity Bound for SDs: The authors prove the first sample complexity bound for Sinkhorn divergences by reformulating SDs as a maximization problem in a Reproducing Kernel Hilbert Space (RKHS). This approach successfully demonstrates that SDs converge with a rate of $1/\sqrt{n}$ , similar to MMD, although the constant in the convergence rate is $\varepsilon$ -dependent. This theoretical finding builds a comprehensive bridge between the OT and MMD theories.
Sobolev Space Bound on Optimizers: It is shown that the optimizers of regularized OT lie within a Sobolev (RKHS) ball that is independent of the two measures. This insight is critical in understanding the behavior of regularized OT solutions and justifies the use of kernel-based stochastic gradient descent (kernel-SGD) to compute regularized OT, leveraging the RKHS framework.

Implications and Speculation on Future Developments

The rigorous theoretical examination presented in this paper serves as a foundation for understanding the trade-offs involved in choosing the regularization parameter for SDs in machine learning tasks. The paper's insights can be directly applied to enhance computational strategies in high-dimensional settings where classical OT approaches are often computationally prohibitive. The interpolation property between OT and MMD elucidated here suggests potential avenues for adaptive algorithms which can seamlessly adjust the regularization parameter $\varepsilon$ in response to dataset characteristics, balancing approximation quality against computational tractability.

The work's implications extend beyond the immediate results, proposing a framework where divergence-based techniques can be finely tuned to suit specific problem settings by leveraging the flexibility of SDs. Given the foundational nature of comparative measures in generative modeling, transportation problems, and classification tasks, further exploration of SDs could yield robust methodologies applicable in these domains.

In conclusion, this paper extends the theoretical understanding of SDs, positions them as a flexible tool amidst OT and MMD, and suggests practical pathways to leverage these in broader machine learning and statistical tasks. Future research could extend these results, perhaps exploring kernel adaptability or derivative structures of SDs in even more complex feature spaces, potentially incorporating deep learning networks.