Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithms for Learning Kernels Based on Centered Alignment (1203.0550v3)

Published 2 Mar 2012 in cs.LG and cs.AI

Abstract: This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Learning convex combinations of continuously parameterized basic kernels. In COLT, 2005.
  2. A DC-programming algorithm for kernel selection. In ICML, 2006.
  3. Francis Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS, 2008.
  4. On a theory of learning with similarity functions. In ICML, 2006.
  5. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In ACL, 2007.
  6. A training algorithm for optimal margin classifiers. In COLT, volume 5, 1992.
  7. Algorithmic stability and generalization performance. In NIPS, 2000.
  8. Olivier Bousquet and Daniel J. L. Herrmann. On the complexity of learning the kernel matrix. In NIPS, 2002.
  9. Convex Optimization. Cambridge University Press, 2004.
  10. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3), 2002.
  11. Corinna Cortes. Invited talk: Can learning kernels help performance? In ICML, 2009.
  12. Support-Vector Networks. Machine Learning, 20(3), 1995.
  13. Learning sequence kernels. In MLSP, 2008.
  14. L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-regularization for learning kernels. In UAI, 2009a.
  15. Learning non-linear combinations of kernels. In NIPS, 2009b.
  16. Two-stage learning kernel methods. In ICML, 2010a.
  17. Generalization bounds for learning kernels. In ICML, 2010b.
  18. On the Impact of Kernel Approximation on Learning Accuracy. In AISTATS, 2010c.
  19. Ensembles of kernel predictors. In UAI, 2011a.
  20. Tutorial: Learning kernels. In ICML, 2011b.
  21. On kernel-target alignment. In NIPS, 2001.
  22. On kernel target alignment. http://www.support-vector.net/papers/alignment_JMLR.ps, unpublished, 2002.
  23. Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic learning theory, 2005.
  24. Tony Jebara. Multi-task feature and kernel selection for SVMs. In ICML, 2004.
  25. On the extensions of kernel alignment. technical report 120, Department of Computer Science, Univ. of London, UK, 2002a.
  26. Optimizing kernel alignment over combinations of kernels. technical report 121, Dept. of CS, Univ. of London, UK, 2002b.
  27. Optimal kernel selection in kernel fisher discriminant analysis. In ICML, 2006.
  28. Sparse recovery in large ensembles of kernel machines. In COLT, 2008.
  29. Learning the kernel matrix with semidefinite programming. JMLR, 5, 2004.
  30. Nonstationary kernel combination. In ICML, 2006.
  31. Colin McDiarmid. On the method of bounded differences. Surveys in combinatorics, 141, 1989.
  32. Marina Meila. Data centering in feature space. In AISTATS, 2003.
  33. Learning the kernel function via regularization. JMLR, 6, 2005.
  34. Learning the kernel with hyperkernels. JMLR, 6, 2005.
  35. Optimizing kernel alignment by data translation in feature space. In ICASSP, 2008.
  36. Ridge regression learning algorithm in dual variables. In ICML, 1998.
  37. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531–1565, 2006.
  38. Learning bounds for support vector machines with learned kernels. In COLT, 2006.
  39. Vladimir N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
  40. More generality in efficient multiple kernel learning. In ICML, 2009.
  41. Multiclass multiple kernel learning. In ICML, 2007.
Citations (533)

Summary

  • The paper proposes novel alignment-based kernel learning methods, including independent, joint, and single-stage algorithms optimized via quadratic programming.
  • It establishes theoretical guarantees with new concentration and stability-based generalization bounds that underpin improved kernel ridge regression performance.
  • Empirical results show that centered alignment methods consistently outperform uniform kernel combinations in diverse tasks such as sentiment analysis and regression.

Analysis of "Algorithms for Learning Kernels Based on Centered Alignment"

The paper by Cortes, Mohri, and Rostamizadeh introduces novel algorithms for learning kernels using the concept of centered alignment, targeting improvements over traditional uniform kernel combinations, and various other methods in classification and regression tasks.

Core Contributions

The paper proposes learning kernel algorithms rooted in the concept of centered alignment, a similarity measure employed between kernels or kernel matrices. The authors present several algorithms built on this measure, demonstrating consistent empirical enhancement over the uniform combination of kernels, a standard baseline difficult to surpass in previous literature.

Key Algorithmic Innovations:

  1. Independent and Joint Alignment-Based Algorithms:
    • An independent alignment-based method assigns weights to base kernels based on their alignment with the target labels.
    • A more sophisticated joint alignment algorithm optimizes kernel combinations by maximizing alignment, efficiently solved via a quadratic programming approach.
  2. Single-Stage Algorithm:
    • Besides two-stage techniques, the paper offers a single-stage algorithm leveraging centered alignment, allowing simultaneous learning of kernel weights and hypotheses.

Theoretical Insights

The research delivers a comprehensive theoretical grounding for centered alignment:

  • Concentration Bounds: The authors derive novel bounds demonstrating the concentration of centered alignment between kernel matrices around the expected alignment value, crucial for establishing the reliability of empirical estimates.
  • Generalization Bounds: Stability-based generalization bounds are established for learning kernel algorithms, particularly when employing kernel ridge regression in the second stage. These bounds are critical for understanding the learning guarantees these new methods can provide.
  • Predictor Existence Theorems: The existence of accurate predictors is demonstrated under conditions of high centered alignment. This supports the theoretical robustness of using alignment as a learning criteria.

Empirical Validation

Experiments across various datasets validate the theoretical claims, showing that centered alignment-based methods consistently outperform uniform kernel combinations and other standard learning kernel approaches:

  • Kernels from Gaussian Bases: The alignment-based algorithms prove their efficacy by improving prediction accuracy and alignment measures.
  • Rank-One Kernels: Further experiments with rank-one kernels, stemming from sentiment analysis datasets, reveal the utility of these methods beyond general kernels.

Implications and Future Directions

The implications of this work span both theoretical and practical realms in kernel-based learning:

  • Theoretical Impact: The research fortifies the understanding that kernels with high alignment to target outputs lead to more effective predictors, supporting further exploration into learning kernels based on various measures of similarity.
  • Practical Applications: Given the consistent performance improvements observed, these algorithms have potential applications in diverse machine learning tasks where kernel methods are employed.

Future research could build on these insights by exploring other similarity measures for kernel learning, potentially creating newer, more efficient algorithms tailored to specific applications or datasets.

In conclusion, this work advances the field of learning kernels substantially, providing both refined theoretical insights and validated empirical methodologies, setting the stage for future innovation in kernel-based machine learning frameworks.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets