When Gaussian Process Meets Big Data: A Review of Scalable GPs (1807.01065v2)

Published 3 Jul 2018 in stat.ML and cs.LG

Abstract: The vast quantity of information brought by big data as well as the evolving computer hardware encourages success stories in the machine learning community. In the meanwhile, it poses challenges for the Gaussian process (GP) regression, a well-known non-parametric and interpretable Bayesian model, which suffers from cubic complexity to data size. To improve the scalability while retaining desirable prediction quality, a variety of scalable GPs have been presented. But they have not yet been comprehensively reviewed and analyzed in order to be well understood by both academia and industry. The review of scalable GPs in the GP community is timely and important due to the explosion of data size. To this end, this paper is devoted to the review on state-of-the-art scalable GPs involving two main categories: global approximations which distillate the entire data and local approximations which divide the data for subspace learning. Particularly, for global approximations, we mainly focus on sparse approximations comprising prior approximations which modify the prior but perform exact inference, posterior approximations which retain exact prior but perform approximate inference, and structured sparse approximations which exploit specific structures in kernel matrix; for local approximations, we highlight the mixture/product of experts that conducts model averaging from multiple local experts to boost predictions. To present a complete review, recent advances for improving the scalability and capability of scalable GPs are reviewed. Finally, the extensions and open issues regarding the implementation of scalable GPs in various scenarios are reviewed and discussed to inspire novel ideas for future research avenues.

Citations (636)

View on Semantic Scholar

Summary

The paper presents its main contribution by categorizing scalable GP methods into global and local approaches to address cubic complexity in big data.
It details global approximations like Subset-of-Data, sparse kernels, and inducing point methods that reduce computational costs and enhance prediction reliability.
The review emphasizes future research directions such as integrating GPs with deep learning and online adaptations for effective high-dimensional modeling.

A Review of Scalable Gaussian Processes

This paper presents a comprehensive review of scalable Gaussian Processes (GPs) in the context of big data, focusing on methods to handle the cubic complexity challenges inherent in traditional GP regression. The authors categorize the scalable GPs into global and local approximations and provide an in-depth exploration of each approach.

Global Approximations

Global approximations aim to approximate the full kernel matrix to enhance scalability:

Subset-of-Data (SoD): Utilizes a subset of training data to reduce computational complexity. Although it offers a simple solution, SoD faces challenges in maintaining prediction quality, often providing overconfident variances due to limited data representation.
Sparse Kernels: This approach directly modifies the kernel structure, using Compactly Supported (CS) kernels to achieve a sparse representation. The method reduces time complexity but requires ensuring the positive semi-definiteness of the induced kernel matrix.
Sparse Approximations: These methods leverage inducing points to capture essential data characteristics, resulting in a lower-rank approximation of the kernel matrix. Variants include:
- Prior Approximations: Modify the joint prior to enhance scalabilities, such as Subset-of-Regressors (SoR) and FIC.
- Posterior Approximations: Utilize variational inference to approximate posteriors, significantly improving model performance over prior approximations.
- Structured Sparse Approximations: Investigate algebraic structures, like Kronecker products, to accelerate computations further.

Local Approximations

Local approximations enhance scalability by focusing on subsets of the data:

Naive-Local-Experts (NLE): Use localized subsets to make predictions. However, they often suffer from discontinuities and poor generalization capabilities due to missing global correlations.
Mixture-of-Experts (MoE): Combine diverse local models with gating functions to improve accuracy and reliability. This approach allows the adaptation to non-stationary data better than simplistic local models.
Product-of-Experts (PoE): Aggregates expert predictions with varying weights, aiming to mitigate overconfidence seen in naive methods.

Implications and Future Directions

Scalable GPs hold great promise for large-scale datasets. However, the paper identifies several challenges and open issues. These include the integration of GP with deep learning architectures, manifold learning, and online adaptations, all under the constraint of handling high-dimensional input spaces efficiently.

For instance, combining GPs with neural networks or manifolds can improve their ability to model complex and high-dimensional data relationships. In terms of online adaptations, efficient updating mechanisms need to be developed to ensure real-time learning as new data arrives.

Conclusion

The review effectively highlights the significant advancements and ongoing challenges in scalable GPs, offering a valuable resource for researchers aiming to apply GPs to massive datasets. Future research is encouraged to focus on improving scalability while maintaining the interpretability and flexibility inherent in GP models.

PDF Markdown