The Annals of Statistics

Simultaneous adaptation to the margin and to complexity in classification

Guillaume Lecué

Full-text: Open access


We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest an exponential weighting aggregation scheme. We use this aggregation procedure to construct classifiers which adapt automatically to margin and complexity. Two main examples are worked out in which adaptivity is achieved in frameworks proposed by Steinwart and Scovel [Learning Theory. Lecture Notes in Comput. Sci. 3559 (2005) 279–294. Springer, Berlin; Ann. Statist. 35 (2007) 575–607] and Tsybakov [Ann. Statist. 32 (2004) 135–166]. Adaptive schemes, like ERM or penalized ERM, usually involve a minimization step. This is not the case for our procedure.

Article information

Ann. Statist., Volume 35, Number 4 (2007), 1698-1721.

First available in Project Euclid: 29 August 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation
Secondary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}

classification statistical learning fast rates of convergence excess risk aggregation margin complexity of classes of sets SVM


Lecué, Guillaume. Simultaneous adaptation to the margin and to complexity in classification. Ann. Statist. 35 (2007), no. 4, 1698--1721. doi:10.1214/009053607000000055.

Export citation


  • Audibert, J.-Y. and Tsybakov, A. B. (2005). Fast learning rates for plug-in classifiers under the margin condition. Preprint PMA-998. Available at
  • Bartlett, P., Jordan, M. and McAuliffe, J. (2006). Convexity, classification and risk bounds. J. Amer. Statist. Assoc. 101 138–156.
  • Birgé, L. (2006). Model selection via testing: An alternative to (penalized) maximum likelihood estimators. Ann. Inst. H. Poincaré Probab. Statist. 42 273–325.
  • Blanchard, G., Bousquet, O. and Massart, P. (2004). Statistical performance of support vector machines. Available at
  • Blanchard, G., Lugosi, G. and Vayatis, N. (2004). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4 861–894.
  • Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323–375.
  • Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927–961.
  • Buckland, S. T., Burnham, K. P. and Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics 53 603–618.
  • Bunea, F. and Nobel, A. (2005). Sequential procedures for aggregating arbitrary estimators of a conditional mean. Technical Report M984, Dept. Statistics, Florida State Univ.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. École d'Eté de Probabilités de Saint-Flour 2001. Lecture Notes in Math. 1851. Springer, Berlin.
  • Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning and Games. Cambridge Univ. Press.
  • Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning 20 273–297.
  • Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Dudley, R. M. (1974). Metric entropy of some classes of sets with differentible boundaries. J. Approximation Theory 10 227–236.
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407.
  • Hartigan, J. A. (2002). Bayesian regression using Akaike priors. Preprint, Dept. Statistics, Yale Univ.
  • Juditsky, A. B. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712.
  • Juditsky, A. B., Nazin, A. V., Tsybakov, A. B. and Vayatis, N. (2005). Recursive aggregation of estimators by the mirror descent method with averaging. Problems Inform. Transmission 41 368–384.
  • Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902–1914.
  • Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization (with discussion). Ann. Statist. 36 2593–2706.
  • Koltchinskii, V. and Panchenko, D. (2000). Rademacher penalties and bounding the risk of function learning. In High Dimensional Probability II (E. Giné, D. M. Mason and J. A. Wellner, eds.) 443–457. Birkhäuser, Boston.
  • Korostelëv, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82. Springer, New York.
  • Leung, G. and Barron, A. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
  • Lin, Y. (1999). A note on margin-based loss functions in classification. Technical Report 1029r, Dept. Statistics, Univ. Wisconsin-Madison.
  • Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 30–55.
  • Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679–1697.
  • Mammen, E. and Tsybakov, A. B. (1995). Asymptotical minimax recovery of sets with smooth boundaries. Ann. Statist. 23 502–524.
  • Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829.
  • Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. (6) 9 245–303.
  • Massart, P. (2007). Concentration Inequalities and Model Selection. Lectures Notes in Math. 1896. Springer, Berlin.
  • Massart, P. and Nédélec, E. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326–2366.
  • Nemirovski, A. (2000). Topics in non-parametric statistics. Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Springer, Berlin.
  • Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686.
  • Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press, Cambridge.
  • Steinwart, I. and Scovel, C. (2005). Fast rates for support vector machines. Learning Theory. Lecture Notes in Comput. Sci. 3559 279–294. Springer, Berlin.
  • Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels. Ann. Statist. 35 575–607.
  • Tarigan, B. and van de Geer, S. A. (2006). Classifiers of support vector machine type with $l_1$ complexity regularization. Bernoulli 12 1045–1076.
  • Tsybakov, A. B. (2003). Optimal rates of aggregation. Learning Theory and Kernel Machines. Lecture Notes in Artificial Intelligence 2777 303–313. Springer, Heidelberg.
  • Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.
  • Tsybakov, A. B. and van de Geer, S. A. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203–1224.
  • van de Geer, S. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press.
  • Vovk, V. (1990). Aggregating strategies. In Proc. Third Annual Workshop on Computational Learning Theory (Mark Fulk and John Case, eds.) 371–383. Morgan Kaufmann. San Mateo, CA.
  • Yang, Y. (2000). Combining different procedures for adaptive regression. J. Multivariate Anal. 74 135–161.
  • Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75–87.
  • Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56–85.