Randomized sketches for kernels: Fast and optimal non-parametric regression (1501.06195v1)

Published 25 Jan 2015 in stat.ML, cs.DS, cs.LG, and stat.CO

Abstract: Kernel ridge regression (KRR) is a standard method for performing non-parametric regression over reproducing kernel Hilbert spaces. Given $n$ samples, the time and space complexity of computing the KRR estimate scale as $\mathcal{O}(n^3)$ and $\mathcal{O}(n^2)$ respectively, and so is prohibitive in many cases. We propose approximations of KRR based on $m$-dimensional randomized sketches of the kernel matrix, and study how small the projection dimension $m$ can be chosen while still preserving minimax optimality of the approximate KRR estimate. For various classes of randomized sketches, including those based on Gaussian and randomized Hadamard matrices, we prove that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors). Thus, we obtain fast and minimax optimal approximations to the KRR estimate for non-parametric regression.

Authors (3)

Yun Yang (122 papers)
Mert Pilanci (102 papers)
Martin J. Wainwright (141 papers)

Citations (171)

View on Semantic Scholar

Summary

Randomized Sketches for Kernels: Fast and Optimal Non-Parametric Regression

The paper tackles the computational inefficiency inherent in Kernel Ridge Regression (KRR) by proposing a novel approximation technique leveraging randomized sketches. KRR is a robust method for non-parametric regression in reproducing kernel Hilbert spaces (RKHS), yet its applicability is often hampered by computational demands, with time complexity scaling as $\mathcal{O}(n^3)$ and space complexity as $\mathcal{O}(n^2)$ . This computational overhead becomes prohibitive as the sample size $n$ increases.

The authors address this challenge by introducing a method to approximate KRR using $m$ -dimensional randomized sketches of the kernel matrix. Such an approach seeks to lessen computation while retaining minimax optimality. The core problem tackled is determining how small the sketch dimension $m$ can be chosen and still preserve the statistical optimality of the approximation.

Methodological Overview

Several types of randomized sketches are explored, including those based on Gaussian random matrices and randomized Hadamard matrices. The essential insight is that it suffices to choose $m$ proportional to the statistical dimension of the kernel matrix, modulated by logarithmic factors, thus circumventing the costly full-rank computation and still achieving minimax optimality.

The kernel's statistical dimension is pivotal in assessing approximation efficacy—it quantifies the kernel matrix's effective degrees of freedom. Incorporating this notion, the paper scrutinizes different sketch types:

Sub-Gaussian Sketches: Utilizing matrices with i.i.d. zero-mean 1-sub-Gaussian rows, where the dimension can be scaled directly with the statistical dimension.
ROS Sketches: Employing randomized orthogonal system sketches (e.g., Hadamard or discrete Fourier transform matrices) scaled with logarithmic factors.

Empirical and Theoretical Results

The authors provide a rigorous theoretical foundation, proving that for Gaussian and ROS sketches, the sketched KRR retains minimax optimality within the outlined dimensionality conditions. Simulation results demonstrate these theoretical predictions, verifying that the sketched regression function approaches the performance of the original KRR across various kernel classes, including polynomial and Gaussian kernels, and Sobolev spaces.

The paper further contrasts their method with Nystr\"{o}m-based approaches, which form low-rank approximations to the kernel matrix. It's shown that Nystr\"{o}m methods can fare poorly with irregular data distributions due to their dependency on column sampling, unlike the randomized sketches proposed.

Implications and Future Directions

The implications of this work are noteworthy for computationally efficient statistical estimation in large-scale settings, where reducing time and space complexity without sacrificing statistical performance is crucial. The authors' approach provides a robust avenue for future studies into randomized sketch application in other high-dimensional statistical estimation tasks, potentially extending beyond regression problems.

In conclusion, by leveraging randomized sketching methods and focusing on the kernel matrix's statistical dimension, the paper provides compelling evidence and a theoretical groundwork for efficiently approximating KRR. This contributes significantly to the field’s growing interest in scaling non-parametric methods to large data applications.

PDF Markdown