Kernel Thinning (2105.05842v11)

Published 12 May 2021 in stat.ML, cs.LG, math.ST, stat.CO, stat.ME, and stat.TH

Abstract: We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is $O_d(n^{{-1/2}\sqrt{\log} n})$ in probability for compactly supported $\mathbb{P}$ and $O_d(n^{{-\frac{1}{2}}} (\log n)^{{(d+1)/2}\sqrt{\log\log} n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. Moreover, the same construction delivers near-optimal $L^\infty$ coresets in $O(n^2)$ time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.

Citations (32)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Faster Kernel Matrix Algebra via Density Estimation (2021)
Distribution Compression in Near-linear Time (2021)
Debiased Distribution Compression (2024)
A Quasi-Monte Carlo Data Structure for Smooth Kernel Evaluations (2024)
Finer-Grained Hardness of Kernel Density Estimation (2024)

GitHub

GitHub - microsoft/goodpoints: A Python package for generating concise, high-quality summaries of a probability distribution (53 stars)