Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Kernel Flexibility via Learning Asymmetric Locally-Adaptive Kernels (2310.05236v1)

Published 8 Oct 2023 in cs.LG

Abstract: The lack of sufficient flexibility is the key bottleneck of kernel-based learning that relies on manually designed, pre-given, and non-trainable kernels. To enhance kernel flexibility, this paper introduces the concept of Locally-Adaptive-Bandwidths (LAB) as trainable parameters to enhance the Radial Basis Function (RBF) kernel, giving rise to the LAB RBF kernel. The parameters in LAB RBF kernels are data-dependent, and its number can increase with the dataset, allowing for better adaptation to diverse data patterns and enhancing the flexibility of the learned function. This newfound flexibility also brings challenges, particularly with regards to asymmetry and the need for an efficient learning algorithm. To address these challenges, this paper for the first time establishes an asymmetric kernel ridge regression framework and introduces an iterative kernel learning algorithm. This novel approach not only reduces the demand for extensive support data but also significantly improves generalization by training bandwidths on the available training data. Experimental results on real datasets underscore the remarkable performance of the proposed algorithm, showcasing its superior capability in handling large-scale datasets compared to Nystr\"om approximation-based algorithms. Moreover, it demonstrates a significant improvement in regression accuracy over existing kernel-based learning methods and even surpasses residual neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Ian Abramson. On bandwidth variation in kernel estimates-a square root law. Annals of Statistics, 10:1217–1223, 1982.
  2. EasyMKL: a scalable multiple kernel learning algorithm. Neurocomputing, 169:215–224, 2015.
  3. A convergence theory for deep learning via over-parameterization. In International conference on machine learning, pp. 242–252. PMLR, 2019.
  4. UCI machine learning repository, 2007.
  5. Francis Bach. Information theory with kernel methods. IEEE Transactions on Information Theory, 69(2):752–775, 2022.
  6. Convex optimization. Cambridge university press, 2004.
  7. Locally adaptive bandwidth choice for kernel regression estimators. Journal of the American Statistical Association, 88:1302–1309, 1993.
  8. Deep residual learning for nonlinear regression. Entropy, 22(2):193, 2020.
  9. Comparison of adaptive methods for function estimation from samples. IEEE Transactions on Neural Networks, 7(4):969–984, 1996.
  10. Support-vector networks. Machine learning, 20:273–297, 1995.
  11. R-squared for Bayesian regression models. The American Statistician, 2019.
  12. When do neural networks outperform kernel methods? Advances in Neural Information Processing Systems, 33:14820–14830, 2020.
  13. Multiple kernel learning algorithms. The Journal of Machine Learning Research, 12:2211–2268, 2011.
  14. Optimal learning with anisotropic Gaussian SVMs. Applied and Computational Harmonic Analysis, 55:337–367, 2021.
  15. Harlfoxem. House sales in king county, usa. 2016. URL https://www.kaggle.com/harlfoxem/housesalesprediction.
  16. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV), pp.  1026–1034, 2015.
  17. Learning with asymmetric kernels: Least squares and feature interpretation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):10044–10054, 2023.
  18. Primal space sparse kernel partial least squares regression for large scale problems. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), volume 1, pp.  561–563. IEEE, 2004.
  19. Subset based least squares subspace regression in RKHS. Neurocomputing, 63:293–323, 2005.
  20. Classification with truncated ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT distance kernel. IEEE Transactions on Neural Networks and Learning Systems, 29(5):2025–2030, 2018. doi: 10.1109/TNNLS.2017.2668610.
  21. Quantum machine learning beyond kernel methods. Nature Communications, 14(1):517, 2023.
  22. Asymmetric kernel method and its application to Fisher’s discriminant. In 18th International Conference on Pattern Recognition (ICPR’06), volume 2, pp.  820–824. IEEE, 2006.
  23. Mklpy: a python-based framework for multiple kernel learning. arXiv preprint arXiv:2007.09982, 2020.
  24. On reproducing kernel Banach spaces: Generic definitions and unified framework of constructions. Acta Mathematica Sinica, English Series, 38(8):1459–1483, 2022.
  25. Learning data-adaptive non-parametric kernels. The Journal of Machine Learning Research, 21(1):8590–8628, 2020.
  26. The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In International Conference on Machine Learning, 2017.
  27. Efficient hyperparameter tuning for large scale kernel ridge regression. In International Conference on Artificial Intelligence and Statistics, pp.  6554–6572. PMLR, 2022.
  28. The interpolation phase transition in neural networks: Memorization and generalization under lazy training. ArXiv, abs/2007.12826, 2020.
  29. A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Advances in neural information processing systems, 16, 2003.
  30. Kevin P. Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
  31. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
  32. Falkon: An optimal large scale kernel method. Advances in neural information processing systems, 30, 2017.
  33. Ridge regression learning algorithm in dual variables. In International Conference on Machine Learning, 1998.
  34. Kernel principal component analysis. In International Conference on Artificial Neural Networks, pp.  583–588. Springer, 1997.
  35. Learning with hierarchical Gaussian kernels. arXiv preprint arXiv:1612.00824, 2016.
  36. Johan A.K. Suykens. SVD revisited: A new variational principle, compatible feature maps and nonlinear extensions. Applied and Computational Harmonic Analysis, 40(3):600–609, 2016.
  37. Least squares support vector machine classifiers. Neural processing letters, 9:293–300, 1999.
  38. Least Squares Support Vector Machines. World Scientific, 2002.
  39. Accurate telemonitoring of parkinson’s disease progression by non-invasive speech tests. Nature Precedings, pp.  1–1, 2009.
  40. Pantelis Vlachos and M Meyer. Statlib datasets archive. http://lib.stat.cmu.edu/datasets, 2005.
  41. Vladimir Vovk. Kernel ridge regression. In Empirical inference, pp.  105–116. Springer, 2013.
  42. Using the Nyström method to speed up kernel machines. Advances in neural information processing systems, 13, 2000.
  43. Deep kernel learning. In Artificial intelligence and statistics, pp.  370–378. PMLR, 2016.
  44. Adaptively weighted kernel regression. Journal of Nonparametric Statistics, 25:855 – 872, 2013.
  45. Learning ability of interpolating deep convolutional neural networks. Applied and Computational Harmonic Analysis, 68:101582, 2024.
Citations (1)

Summary

We haven't generated a summary for this paper yet.