Abstract and Applied Analysis

Error Bounds for l p -Norm Multiple Kernel Learning with Least Square Loss

Shao-Gao Lv and Jin-De Zhu

Full-text: Open access

Abstract

The problem of learning the kernel function with linear combinations of multiple kernels has attracted considerable attention recently in machine learning. Specially, by imposing an l p -norm penalty on the kernel combination coefficient, multiple kernel learning (MKL) was proved useful and effective for theoretical analysis and practical applications (Kloft et al., 2009, 2011). In this paper, we present a theoretical analysis on the approximation error and learning ability of the l p -norm MKL. Our analysis shows explicit learning rates for l p -norm MKL and demonstrates some notable advantages compared with traditional kernel-based learning algorithms where the kernel is fixed.

Article information

Source
Abstr. Appl. Anal., Volume 2012 (2012), Article ID 915920, 18 pages.

Dates
First available in Project Euclid: 14 December 2012

Permanent link to this document
https://projecteuclid.org/euclid.aaa/1355495803

Digital Object Identifier
doi:10.1155/2012/915920

Mathematical Reviews number (MathSciNet)
MR2959739

Zentralblatt MATH identifier
1280.68177

Citation

Lv, Shao-Gao; Zhu, Jin-De. Error Bounds for ${l}^{p}$ -Norm Multiple Kernel Learning with Least Square Loss. Abstr. Appl. Anal. 2012 (2012), Article ID 915920, 18 pages. doi:10.1155/2012/915920. https://projecteuclid.org/euclid.aaa/1355495803


Export citation

References

  • C. Cortes, M. Mohri, and A. Rostamizadeh, “Generalization bounds for learning kernels,” in Proceedings of the 27th International Conference on Machine Learning (ICML '10), pp. 247–254, June 2010.
  • M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien, “${l}^{p}$-norm multiple kernel learning,” Journal of Machine Learning Research, vol. 12, pp. 953–997, 2011.
  • G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan, “Learning the kernel matrix with semidefinite programming,” Journal of Machine Learning Research, vol. 5, pp. 27–72, 2004.
  • C. A. Micchelli and M. Pontil, “Learning the kernel function via regularization,” Journal of Machine Learning Research, vol. 6, pp. 1099–1125, 2005.
  • N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.
  • O. Bousquet and A. Elisseeff, “Stability and generalization,” Journal of Machine Learning Research, vol. 2, no. 3, pp. 499–526, 2002.
  • F. Cucker and S. Smale, “On the mathematical foundations of learning,” American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.
  • Y. K. Zhu and H. W. Sun, “Consisitency analysis of spectral regularization algorithms,” Abstract and Applied Analysis, vol. 2012, Article ID 436510, 16 pages, 2012.
  • Y. Ying and D.-X. Zhou, “Learnability of Gaussians with flexible variances,” Journal of Machine Learning Research, vol. 8, pp. 249–276, 2007.
  • Q. Wu, Y. Ying, and D.-X. Zhou, “Learning rates of least-square regularized regression,” Foundations of Computational Mathematics, vol. 6, no. 2, pp. 171–192, 2006.
  • T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer, New York, NY, USA, 2nd edition, 2001.
  • Q. Wu, Y. Ying, and D.-X. Zhou, “Multi-kernel regularized classifiers,” Journal of Complexity, vol. 23, no. 1, pp. 108–134, 2007.
  • M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society B, vol. 68, no. 1, pp. 49–67, 2006.
  • A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “More efficiency in multiple kernel learning,” in Proceedings of the 24th International Conference on Machine Learning (ICML '07), pp. 775–782, Corvallis, Ore, USA, June 2007.
  • M. Kloft and G. Blanchard, “The local rademacher complexity of ${l}^{p}$-norm multiple kernel learning,” in Advances in Neural Information Processing Systems (NIPS '11), pp. 2438–2446, MIT Press, 2011.
  • A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes: With Applications to Statistics, Springer Series in Statistics, Springer, New York, NY, USA, 1996.
  • S. Smale and D.-X. Zhou, “Learning theory estimates via integral operators and their approximations,” Constructive Approximation, vol. 26, no. 2, pp. 153–172, 2007.
  • S. Smale and D.-X. Zhou, “Shannon sampling. II. Connections to learning theory,” Applied and Computational Harmonic Analysis, vol. 19, no. 3, pp. 285–302, 2005.
  • A. Micchelli, M. Pontil, Q. Wu, and D. X. Zhou, “Error bounds for learning the kernel,” Research Note 05–09, University of College, London, UK, 2005.
  • D.-X. Zhou, “Capacity of reproducing kernel spaces in learning theory,” Institute of Electrical and Electronics Engineers, vol. 49, no. 7, pp. 1743–1752, 2003.
  • H. Sun and Q. Wu, “A note on application of integral operator in learning theory,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 416–421, 2009.
  • O. Chapelle, J. Weston, and B. Sch'olkopf, “Cluster kernels for semi-supervised learning15,” in Advances in Neural Information Processing Systems (NIPS '03), pp. 585–592, MIT Press, 2003.
  • R. Johnson and T. Zhang, “Graph-based semi-supervised learning and spectral kernel design,” Institute of Electrical and Electronics Engineers, vol. 54, no. 1, pp. 275–288, 2008.
  • U. von Luxburg and B. Sch'olkopf, “Statistical learning theory: models, concepts, and results”.
  • Y. Xu and H. Zhang, “Refinement of reproducing kernels,” Journal of Machine Learning Research, vol. 10, pp. 107–140, 2009.
  • V. Koltchinskii, “Sparsity in penalized empirical risk minimization,” Annales de l'Institut Henri Poincaré Probabilités et Statistiques, vol. 45, no. 1, pp. 7–57, 2009.