Exact Gaussian Processes on a Million Data Points (1903.08114v2)

Published 19 Mar 2019 in cs.LG, cs.DC, and stat.ML

Abstract: Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data. However, computational constraints with standard inference procedures have limited exact GPs to problems with fewer than about ten thousand training points, necessitating approximations for larger datasets. In this paper, we develop a scalable approach for exact GPs that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication. By partitioning and distributing kernel matrix multiplies, we demonstrate that an exact GP can be trained on over a million points, a task previously thought to be impossible with current computing hardware, in less than 2 hours. Moreover, our approach is generally applicable, without constraints to grid data or specific kernel classes. Enabled by this scalability, we perform the first-ever comparison of exact GPs against scalable GP approximations on datasets with $10⁴ !-! 10^6$ data points, showing dramatic performance improvements.

Authors (6)

Ke Alexander Wang (11 papers)
Geoff Pleiss (41 papers)
Stephen Tyree (29 papers)
Kilian Q. Weinberger (105 papers)
Andrew Gordon Wilson (133 papers)
Jacob R. Gardner (39 papers)

Citations (214)

View on Semantic Scholar

Summary

The paper introduces a novel multi-GPU parallelization method for exact Gaussian Process inference on datasets with over one million points.
It employs preconditioned conjugate gradients and iterative techniques to reduce computational complexity and memory usage.
Benchmarking shows significantly lower RMSE compared to approximate methods, underscoring enhanced scalability and precision.

Exact Gaussian Processes on a Million Data Points

The paper "Exact Gaussian Processes on a Million Data Points" authored by Ke Alexander Wang et al. addresses the significant computational challenges associated with scaling Gaussian Processes (GPs) to large datasets. The authors propose a novel approach leveraging multi-GPU parallelization and iterative methods to perform exact GP inference efficiently on over a million training points. This work is pivotal as it demonstrates feasibility in a regime traditionally dominated by approximate GP methods due to computational constraints.

Background

Gaussian Processes are non-parametric models offering flexible, scalable capacity with data. Their applications are diverse, ranging from black-box optimization to time-series forecasting. A significant limitation in their application has been the cubic computational complexity associated with exact inference, restricting their use to datasets of fewer than ten thousand points. This limitation has led to the development of various scalable approximations to extend their utility beyond this constraint.

Methodological Innovation

The core contribution of the paper lies in its methodological innovation, primarily the use of Blackbox Matrix-Matrix (BBMM) multiplication procedures enabled by conjugate gradients and exploiting GPU infrastructure. The approach does not require forming explicit kernel matrices, effectively reducing memory complexity to linear in the number of observations per GPU. This improvement is crucial, allowing for inference on substantial datasets without kernel approximation restrictions to structured data or specific kernels.

Key Techniques:

Multi-GPU Parallelization: The authors distribute matrix multiplication tasks across multiple GPUs, considerably decreasing the necessary memory footprint and accelerating computation.
Preconditioned Conjugate Gradients: Utilizing advanced iterative solvers reduces convergence time, enhancing scalability without compromising precision.
Kernel Matrix Partitioning: By partitioning data and distributing workload, memory constraints are minimized while computation remains efficient, achieving $\mathcal{O}(n)$ complexity per GPU.

Results and Implications

The empirical evaluation is robust, with the paper being the first to execute and compare exact GPs with up to million-point datasets. In benchmarking tests against popular approximate methods like SGPR and SVGP, the approach significantly outperformed on nearly all tasks, often exhibiting substantial improvements in root-mean-squared error (RMSE).

This method elucidates the strong performance potential of non-parametric models, reaffirming that GPs can inherently benefit from added data without degrading due to overfitting or complexity mismanagement historically managed by approximate solutions. Exact GPs thus stand out as more reliable when datasets exceed typical limits for accurate inferences.

Future Perspectives

Going forward, the ability to apply exact GPs on datasets previously seen as intractable opens up new possibilities for AI and machine learning applications:

Broader Application: Industries requiring high precision, such as healthcare and financial forecasting, can benefit significantly from these advancements.
Algorithmic Improvements: Continual enhancement of iterative solvers and parallel computing frameworks will likely further refine the efficiencies realized here.
Theoretical Developments: A foundation is set for viewing and proposing new algorithms within this expanded operational capacity of precise, large-scale GP inference.

In conclusion, this work represents a substantive advancement in Gaussian Process methodology, offering a pathway to embrace larger datasets without resorting to approximations, paving the way for future developments in both theoretical and applied machine learning landscapes.

PDF Markdown