Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel (2205.07384v8)
Abstract: It is challenging to guide neural network (NN) learning with prior knowledge. In contrast, many known properties, such as spatial smoothness or seasonality, are straightforward to model by choosing an appropriate kernel in a Gaussian process (GP). Many deep learning applications could be enhanced by modeling such known properties. For example, convolutional neural networks (CNNs) are frequently used in remote sensing, which is subject to strong seasonal effects. We propose to blend the strengths of deep learning and the clear modeling capabilities of GPs by using a composite kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties (e.g., seasonality). We implement this idea by combining a deep network and an efficient mapping based on the Nystrom approximation, which we call Implicit Composite Kernel (ICK). We then adopt a sample-then-optimize approach to approximate the full GP posterior distribution. We demonstrate that ICK has superior performance and flexibility on both synthetic and real-world data sets. We believe that ICK framework can be used to include prior information into neural networks in many applications.
- Exploring the uncertainty properties of neural networks’ implicit priors in the infinite-width limit. arXiv preprint arXiv:2010.07355, 2020.
- Kernel regression with infinite-width neural networks on millions of examples. arXiv preprint arXiv:2303.05420, 2023.
- Deep neural network approach for predicting the productivity of garment employees. In 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 1402–1407. IEEE, 2019.
- Pattern recognition and machine learning, volume 4. Springer, 2006.
- Optimization methods for large-scale machine learning. Siam Review, 60(2):223–311, 2018.
- Gradient-enhanced kriging for high-dimensional problems. Engineering with Computers, 35(1):157–173, 2019.
- Improving kriging surrogates of high-dimensional design models by partial least squares dimension reduction. Structural and Multidisciplinary Optimization, 53(5):935–952, 2016.
- Nonparametric binary regression using a gaussian process prior. Statistical Methodology, 4(2):227–243, 2007.
- Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514):800–812, 2016.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- On the nyström method for approximating a gram matrix for improved kernel-based learning. journal of machine learning research, 6(12), 2005.
- David Duvenaud. Automatic model construction with Gaussian processes. PhD thesis, University of Cambridge, 2014.
- Neural processes. arXiv preprint arXiv:1807.01622, 2018.
- Deep convolutional networks as shallow gaussian processes. arXiv preprint arXiv:1808.05587, 2018.
- Spatial statistics and gaussian processes: A beautiful marriage. Spatial Statistics, 18:86–104, 2016.
- Variational bayesian multinomial probit regression with gaussian process priors. Neural Computation, 18(8):1790–1817, 2006.
- Deep learning. MIT press, 2016.
- Bayesian deep ensembles via the neural tangent kernel. Advances in neural information processing systems, 33:1010–1022, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009, 2022.
- Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.
- Using deep belief nets to learn covariance kernels for gaussian processes. Advances in neural information processing systems, 20, 2007.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- Spatial transformer networks. Advances in neural information processing systems, 28, 2015.
- Improving spatial variation of ground-level pm2. 5 prediction with contrastive learning from satellite imagery. Science of Remote Sensing, pp. 100052, 2022.
- Mining geostatistics. The Blackburn Press, 1976.
- Analyzing nonstationary spatial data using piecewise gaussian processes. Journal of the American Statistical Association, 100(470):653–668, 2005.
- Daniel G Krige. A statistical approach to some basic mine valuation problems on the witwatersrand. Journal of the Southern African Institute of Mining and Metallurgy, 52(6):119–139, 1951.
- Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
- Deep learning. nature, 521(7553):436–444, 2015.
- Deep neural networks as gaussian processes. arXiv preprint arXiv:1711.00165, 2017.
- Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32, 2019.
- The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
- Gary Marcus. Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631, 2018.
- The ridgelet prior: A covariance function approach to prior specification for bayesian neural networks. arXiv preprint arXiv:2010.08488, 2020.
- Sample-then-optimize posterior sampling for bayesian linear models. In NeurIPS Workshop on Advances in Approximate Bayesian Inference, 2017.
- Gaussian process behaviour in wide deep neural networks. arXiv preprint arXiv:1804.11271, 2018.
- Learning multi-modal similarity. Journal of machine learning research, 12(2), 2011.
- Solving the wave equation with physics-informed deep learning. arXiv preprint arXiv:2006.11894, 2020.
- Radford M Neal. Priors for infinite networks. In Bayesian Learning for Neural Networks, pp. 29–53. Springer, 1996.
- Bayesian deep convolutional networks with many channels are gaussian processes. arXiv preprint arXiv:1810.05148, 2018.
- Neural tangents: Fast and easy infinite neural networks in python. In International Conference on Learning Representations, 2020. URL https://github.com/google/neural-tangents.
- Expressive priors in bayesian neural networks: Kernel combinations and periodic functions. In Uncertainty in artificial intelligence, pp. 134–144. PMLR, 2020.
- Evolving gaussian process models for prediction of ozone concentration in the air. Simulation modelling practice and theory, 33:68–80, 2013.
- Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
- Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. Advances in neural information processing systems, 21, 2008.
- Sparse gaussian processes using pseudo-inputs. Advances in neural information processing systems, 18, 2005.
- Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial intelligence and statistics, pp. 567–574. PMLR, 2009.
- Convolutional gaussian processes. Advances in Neural Information Processing Systems, 30, 2017.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Continuously indexed domain adaptation. International Conference on Machine Learning, 2020.
- Exact gaussian processes on a million data points. Advances in neural information processing systems, 32, 2019.
- Using the nyström method to speed up kernel machines. Advances in neural information processing systems, 13, 2000.
- Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
- Bayesian classification with gaussian processes. IEEE Transactions on pattern analysis and machine intelligence, 20(12):1342–1351, 1998.
- Gaussian process kernels for pattern discovery and extrapolation. In International conference on machine learning, pp. 1067–1075. PMLR, 2013.
- Stochastic variational deep kernel learning. Advances in neural information processing systems, 29, 2016a.
- Gaussian process regression networks. arXiv preprint arXiv:1110.4411, 2011.
- Deep kernel learning. In Artificial intelligence and statistics, pp. 370–378. PMLR, 2016b.
- Nyström method vs random fourier features: A theoretical and empirical comparison. Advances in neural information processing systems, 25, 2012.
- Multimodal classification of alzheimer’s disease and mild cognitive impairment. Neuroimage, 55(3):856–867, 2011.
- Estimating ground-level pm2. 5 using micro-satellite images by a convolutional neural network and random forest approach. Atmospheric Environment, 230:117451, 2020.
- Local pm2. 5 hotspot detector at 300 m resolution: A random forest–convolutional neural network joint model jointly trained on satellite images and meteorology. Remote Sensing, 13(7):1356, 2021.
- Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886, 2021.
- Ziyang Jiang (10 papers)
- Tongshu Zheng (2 papers)
- Yiling Liu (8 papers)
- David Carlson (36 papers)