Randomized Sketches for Kernels: Fast and Optimal Non-Parametric Regression
The paper tackles the computational inefficiency inherent in Kernel Ridge Regression (KRR) by proposing a novel approximation technique leveraging randomized sketches. KRR is a robust method for non-parametric regression in reproducing kernel Hilbert spaces (RKHS), yet its applicability is often hampered by computational demands, with time complexity scaling as O(n3) and space complexity as O(n2). This computational overhead becomes prohibitive as the sample size n increases.
The authors address this challenge by introducing a method to approximate KRR using m-dimensional randomized sketches of the kernel matrix. Such an approach seeks to lessen computation while retaining minimax optimality. The core problem tackled is determining how small the sketch dimension m can be chosen and still preserve the statistical optimality of the approximation.
Methodological Overview
Several types of randomized sketches are explored, including those based on Gaussian random matrices and randomized Hadamard matrices. The essential insight is that it suffices to choose m proportional to the statistical dimension of the kernel matrix, modulated by logarithmic factors, thus circumventing the costly full-rank computation and still achieving minimax optimality.
The kernel's statistical dimension is pivotal in assessing approximation efficacy—it quantifies the kernel matrix's effective degrees of freedom. Incorporating this notion, the paper scrutinizes different sketch types:
- Sub-Gaussian Sketches: Utilizing matrices with i.i.d. zero-mean 1-sub-Gaussian rows, where the dimension can be scaled directly with the statistical dimension.
- ROS Sketches: Employing randomized orthogonal system sketches (e.g., Hadamard or discrete Fourier transform matrices) scaled with logarithmic factors.
Empirical and Theoretical Results
The authors provide a rigorous theoretical foundation, proving that for Gaussian and ROS sketches, the sketched KRR retains minimax optimality within the outlined dimensionality conditions. Simulation results demonstrate these theoretical predictions, verifying that the sketched regression function approaches the performance of the original KRR across various kernel classes, including polynomial and Gaussian kernels, and Sobolev spaces.
The paper further contrasts their method with Nystr\"{o}m-based approaches, which form low-rank approximations to the kernel matrix. It's shown that Nystr\"{o}m methods can fare poorly with irregular data distributions due to their dependency on column sampling, unlike the randomized sketches proposed.
Implications and Future Directions
The implications of this work are noteworthy for computationally efficient statistical estimation in large-scale settings, where reducing time and space complexity without sacrificing statistical performance is crucial. The authors' approach provides a robust avenue for future studies into randomized sketch application in other high-dimensional statistical estimation tasks, potentially extending beyond regression problems.
In conclusion, by leveraging randomized sketching methods and focusing on the kernel matrix's statistical dimension, the paper provides compelling evidence and a theoretical groundwork for efficiently approximating KRR. This contributes significantly to the field’s growing interest in scaling non-parametric methods to large data applications.