Electronic Journal of Statistics

Penalized empirical risk minimization over Besov spaces

Sébastien Loustau

Full-text: Open access

Abstract

Kernel methods are closely related to the notion of reproducing kernel Hilbert space (RKHS). A kernel machine is based on the minimization of an empirical cost and a stabilizer (usually the norm in the RKHS). In this paper we propose to use Besov spaces as alternative hypothesis spaces. We study statistical performances of a penalized empirical risk minimization for classification where the stabilizer is a Besov norm. More precisely, we state fast rates of convergence to the Bayes rule. These rates are adaptive with respect to the regularity of the Bayes.

Article information

Source
Electron. J. Statist., Volume 3 (2009), 824-850.

Dates
First available in Project Euclid: 21 August 2009

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1250880017

Digital Object Identifier
doi:10.1214/08-EJS316

Mathematical Reviews number (MathSciNet)
MR2534203

Zentralblatt MATH identifier
1326.62157

Citation

Loustau, Sébastien. Penalized empirical risk minimization over Besov spaces. Electron. J. Statist. 3 (2009), 824--850. doi:10.1214/08-EJS316. https://projecteuclid.org/euclid.ejs/1250880017


Export citation

References

  • [1] Audibert, J.Y. and Tsybakov, A.B. (2007). Fast learning rates for plug-in classifiers., The Annals of Statistics 35 (2), 608–633.
  • [2] Bartlett, P.L., Boucheron, S., and Lugosi, G. (2002). Model selection and error estimation., Machine Learning 48, 85–113.
  • [3] Bartlett, P.L., Bousquet, O., and Mendelson, S. (2005). Local rademacher complexities., The Annals of Statistics 33 (4), 1497–1537.
  • [4] Bartlett, P.L., Jordan, M.I., and McAuliffe, J.D. (2006). Convexity, classification, and risk bounds., J. Amer. Statist. Assoc. 101 (473), 138–156.
  • [5] Bartlett, P.L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results., Journal of Machine Learning Research 3, 463–482.
  • [6] Blanchard, G., Bousquet, O., and Massart, P. (2008). Statistical performance of support vector machines., Annals of Statistics 36 (2).
  • [7] Blanchard, G., Lugosi, G., and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers., Journal of Machine Learning Research 4, 861–894.
  • [8] Boser, B.E., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In, Computational Learning Theory. 144–152.
  • [9] Canu, S., Mary, X., and Rakotomamonjy, A. (2003). Functional learning through kernel., Advances in Learning Theory: Methods, Models and Applications 190, 89–110.
  • [10] Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets., Communications on Pure and Applied Mathematics 41 (7), 909–996.
  • [11] Devroye, L., Györfi, L., and Lugosi, G. (1996)., A Probabilistic Theory of Pattern Recognition. Springer-Verlag.
  • [12] Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., and Picard, D. (1996). Density estimation by wavelet thresholding., The Annals of Statistics 24 (2), 508–539.
  • [13] Härdle, W., Kerkyacharian, G., Picard, D., and Tsybakov, A. (1997)., Wavelets, Approximation, and Statistical Applications. Lecture Notes in Statistics.
  • [14] Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization., IEEE Transactions on Information Theory 47 (5), 1902–1914.
  • [15] Lecué, G. (2008). Classification with minimax fast rates for classes of Bayes rules with sparse representation., Electronic Journal of Statistics 2, 741–773.
  • [16] Loustau, S. (2008). Aggregation of SVM classifiers using Sobolev spaces., Journal of Machine Learning Research 9, 1559–1582.
  • [17] Mallat, S. (2000)., Une exploration des signaux en ondelettes. Ellipses.
  • [18] Mary, X., De Brucq, D., and Canu, S. (2003). Sous-dualités et noyaux (reproduisants) associés., C. R. Acad. Sc. Paris 336 (1), 949–954.
  • [19] Massart, P. and Nédélec, E. (2006). Risk bounds for statistical learning., The Annals of Statistics 34 (5), 2326–2366.
  • [20] Mendelson, S. (2003). On the performance of kernel classes., Journal of Machine Learning Research 4, 759–771.
  • [21] Meyer, Y. (1990)., Ondelettes et Opérateurs 1 : Ondelettes. Hermann.
  • [22] Peetre, J. (1976)., New thoughts on Besov spaces. Mathematics Department, Duke University, Durham, N.C.
  • [23] Rosenthal, H.P. (1972). On the span in, lp of sequences of independent random variables. Israël J. Math. 8, 273–303.
  • [24] Scott, C. and Nowak, R. (2006). Minimax-optimal classification with dyadic decision trees., IEEE Transactions on Information Theory 52-4, 1335–1353.
  • [25] Smale, S. and Zhou, D.X. (2003). Estimating the approximation error in learning theory., Analysis and Applications 1 (1), 17–41.
  • [26] Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels., The Annals of Statistics 35 (2), 575–607.
  • [27] Triebel, H. (1978)., Interpolation Theory, Function Spaces, Differential Operators. North-Holland Publishing Company.
  • [28] Triebel, H. (1992)., Theory of Functions Spaces II. Birkhauser.
  • [29] Tsybakov, A.B. (2004). Optimal aggregation of classifiers in statistical learning., The Annals of Statistics 32 (1), 135–166.
  • [30] Vapnik, V.N. and Chervonenkis, A.Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities., Theory of Probability and its Applications 16 (2), 264–280.
  • [31] Vapnik, V.N. and Chervonenkis, A.Ya. (1974)., Theory of Pattern Recognition. Nauka, Moscow.