Algorithms for Learning Kernels Based on Centered Alignment (1203.0550v3)
Abstract: This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.
- Learning convex combinations of continuously parameterized basic kernels. In COLT, 2005.
- A DC-programming algorithm for kernel selection. In ICML, 2006.
- Francis Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS, 2008.
- On a theory of learning with similarity functions. In ICML, 2006.
- Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In ACL, 2007.
- A training algorithm for optimal margin classifiers. In COLT, volume 5, 1992.
- Algorithmic stability and generalization performance. In NIPS, 2000.
- Olivier Bousquet and Daniel J. L. Herrmann. On the complexity of learning the kernel matrix. In NIPS, 2002.
- Convex Optimization. Cambridge University Press, 2004.
- Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3), 2002.
- Corinna Cortes. Invited talk: Can learning kernels help performance? In ICML, 2009.
- Support-Vector Networks. Machine Learning, 20(3), 1995.
- Learning sequence kernels. In MLSP, 2008.
- L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-regularization for learning kernels. In UAI, 2009a.
- Learning non-linear combinations of kernels. In NIPS, 2009b.
- Two-stage learning kernel methods. In ICML, 2010a.
- Generalization bounds for learning kernels. In ICML, 2010b.
- On the Impact of Kernel Approximation on Learning Accuracy. In AISTATS, 2010c.
- Ensembles of kernel predictors. In UAI, 2011a.
- Tutorial: Learning kernels. In ICML, 2011b.
- On kernel-target alignment. In NIPS, 2001.
- On kernel target alignment. http://www.support-vector.net/papers/alignment_JMLR.ps, unpublished, 2002.
- Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic learning theory, 2005.
- Tony Jebara. Multi-task feature and kernel selection for SVMs. In ICML, 2004.
- On the extensions of kernel alignment. technical report 120, Department of Computer Science, Univ. of London, UK, 2002a.
- Optimizing kernel alignment over combinations of kernels. technical report 121, Dept. of CS, Univ. of London, UK, 2002b.
- Optimal kernel selection in kernel fisher discriminant analysis. In ICML, 2006.
- Sparse recovery in large ensembles of kernel machines. In COLT, 2008.
- Learning the kernel matrix with semidefinite programming. JMLR, 5, 2004.
- Nonstationary kernel combination. In ICML, 2006.
- Colin McDiarmid. On the method of bounded differences. Surveys in combinatorics, 141, 1989.
- Marina Meila. Data centering in feature space. In AISTATS, 2003.
- Learning the kernel function via regularization. JMLR, 6, 2005.
- Learning the kernel with hyperkernels. JMLR, 6, 2005.
- Optimizing kernel alignment by data translation in feature space. In ICASSP, 2008.
- Ridge regression learning algorithm in dual variables. In ICML, 1998.
- Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531–1565, 2006.
- Learning bounds for support vector machines with learned kernels. In COLT, 2006.
- Vladimir N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
- More generality in efficient multiple kernel learning. In ICML, 2009.
- Multiclass multiple kernel learning. In ICML, 2007.