For Kernel Range Spaces a Constant Number of Queries Are Sufficient (2306.16516v1)
Abstract: We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-|p-\cdot|2)$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for $x_i \in X$. An $\varepsilon$-cover is a subset of points $Q \subset \mathbb{R}d$ so for any $p \in \mathbb{R}d$ that $\frac{1}{n} |R_p - R_q|_1\leq \varepsilon$ for some $q \in Q$. This is a smooth analog of Haussler's notion of $\varepsilon$-covers for combinatorial range spaces (e.g., defined by subsets of points within a ball query) where the resulting vectors $R_p$ are in ${0,1}n$ instead of $[0,1]n$. The kernel versions of these range spaces show up in data analysis tasks where the coordinates may be uncertain or imprecise, and hence one wishes to add some flexibility in the notion of inside and outside of a query range. Our main result is that, unlike combinatorial range spaces, the size of kernel $\varepsilon$-covers is independent of the input size $n$ and dimension $d$. We obtain a bound of $(1/\varepsilon){\tilde O(1/\varepsilon2)}$, where $\tilde{O}(f(1/\varepsilon))$ hides log factors in $(1/\varepsilon)$ that can depend on the kernel. This implies that by relaxing the notion of boundaries in range queries, eventually the curse of dimensionality disappears, and may help explain the success of machine learning in very high-dimensions. We also complement this result with a lower bound of almost $(1/\varepsilon){\Omega(1/\varepsilon)}$, showing the exponential dependence on $1/\varepsilon$ is necessary.
- Pankaj K Agarwal. Range searching. In Handbook of discrete and computational geometry, pages 1057–1092. Chapman and Hall/CRC, 2017.
- Maximum number of modes of gaussian mixtures. Information and Inference: A Journal of the IMA, 9:587–600, 2020.
- Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.
- On the equivalence between herding and conditional gradient algorithms. In Proceedings of the 29th International Conference on Machine Learning (ICML’12), pages 1355–1362, 2012.
- On the number of modes of a gaussian mixture. In In International Conference on Scale-Space Theories in Computer Vision, pages 625–640. Springer, 2003.
- Kernel density estimation through density constrained near neighbor search. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 172–183. IEEE, 2020.
- Hashing-based-estimators for kernel density in high dimensions. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 1032–1043. IEEE, 2017.
- Super-samples from kernel hearding. In Conference on Uncertainty in Artificial Intellegence, 2010.
- Optimal approximations made easy. Information Processing Letters, 176:106250, 2022.
- Add isotropic gaussian kernels at own risk: More and more resiliant modes in higher dimensions. In Proceedings 28th Annual Symposium on Computational Geometry (SoCG), pages 91–100, 2012.
- Support vector subset scan for spatial pattern detection. Computational Statistics &\&& Data Analysis, 157:107149, 2021.
- The kernel spatial scan statistic. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 349–358, 2019.
- Sariel Har-Peled. Geometric Approximation Algorithms. American Mathematical Society, 2011.
- David Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded vapnik-chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217–232, 1995.
- Gaussian process subset scanning for anomalous pattern detection in non-iid data. In International Conference on Artificial Intelligence and Statistics, pages 425–434. PMLR, 2018.
- On outer bi-lipschitz extensions of linear johnson-lindenstrauss embeddings of low-dimensional submanifolds of ℝnsuperscriptℝ𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. arXiv:2206.03376v, pages 1–19, 2022.
- A faster interior point method for semidefinite programming. In IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), 2020.
- Comparing distributions and shapes using the kernel distance. In Proceedings of the twenty-seventh annual symposium on Computational geometry, pages 47–56, 2011.
- Discrepancy, coresets, and sketches in machine learning. In Conference on Learning Theory, pages 1975–1993. PMLR, 2019.
- Deann: Speeding up kernel-density estimation using approximate nearest neighbor search. In International Conference on Artificial Intelligence and Statistics, pages 3108–3137. PMLR, 2022.
- Martin Kulldorff. A spatial scan statistic. Communications in Statistics-Theory and methods, 26(6):1481–1496, 1997.
- Sequential kernel herding: Frank-wolfe optimization for particle filtering. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (PMLR), pages 544–552, 2015.
- Improved bounds on the samples complexity of learning. Journal of Computer ans System Science, 62:516–527, 2001.
- Towards a learning theory of cause-effect inference. In International Conference on Machine Learning, pages 1452–1461. PMLR, 2015.
- Computing approximate statistical discrepancy. In International Symposium on Algorithm and Computation (ISAAC), pages 32:1–32:13, 2018.
- Scalable spatial scan statistics through sampling. In Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, pages 1–10, 2016.
- Generalization error bounds for bayesian mixture algorithms. Journal of Machine Learning Research, pages 839–860, 2003.
- Foundations of Machine Learning. MIT Press, Second Edition, 2018.
- Nabil H Mustafa. Sampling in Combinatorial and Geometric Set Systems. American Mathematical Society (AMS), Mathematical surveys and monographs, 2022.
- Optimal terminal dimensionality reduction in euclidean space. In ACM Symposium on Theory of Computing (STOC), ACM, pages 1064–1069, 2019.
- Kriging: a method of interpolation for geographical information systems. International Journal of Geographical Information System, 4(3):313–332, 1990.
- Jeff M Phillips. ε𝜀\varepsilonitalic_ε-samples for kernels. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (SIAM), 2013.
- Near-optimal coresets of kernel density estimates. Discrete &\&& Computational Geometry, 63(4):867–887, 2020.
- Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2018.
- Clayton Scott. Rademacher complexity of kernel classes, 2014.
- Wai Ming Tai. Optimal Coreset for Gaussian Kernel Density Estimation. In 38th International Symposium on Computational Geometry (SoCG 2022), volume 224, 2022.
- Michel Talagrand. Sharper bounds for gaussian and emperical processes. Annals of Probability, 22(1):28–76, 1994.
- Vladimir Vapnik. Principles of risk minimization for learning theory. Advances in neural information processing systems, 4, 1991.
- Coresets for kernel regression. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 645–654, 2017.