The Annals of Statistics

Risk bounds for statistical learning

Pascal Massart and Élodie Nédélec

Full-text: Open access


We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classification framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with ways of measuring the “size” of a class of classifiers other than entropy with bracketing as in Tsybakov’s work. In particular, we derive new risk bounds for the ERM when the classification rules belong to some VC-class under margin conditions and discuss the optimality of these bounds in a minimax sense.

Article information

Ann. Statist., Volume 34, Number 5 (2006), 2326-2366.

First available in Project Euclid: 23 January 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60E15: Inequalities; stochastic orderings
Secondary: 60F10: Large deviations 94A17: Measures of information, entropy

Classification concentration inequalities empirical processes entropy with bracketing minimax estimation model selection pattern recognition regression estimation statistical learning structural minimization of the risk VC-class VC-dimension


Massart, Pascal; Nédélec, Élodie. Risk bounds for statistical learning. Ann. Statist. 34 (2006), no. 5, 2326--2366. doi:10.1214/009053606000000786.

Export citation


  • Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301--413.
  • Birgé, L. (2005). A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory 51 1611--1615.
  • Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329--375.
  • Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495--500.
  • Devroye, L. and Lugosi, G. (1995). Lower bounds in pattern recognition and learning. Pattern Recognition 28 1011--1018.
  • Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Univ. Press.
  • Edelsbrunner, H. (1987). Algorithms in Combinatorial Geometry. Springer, Berlin.
  • Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik--Chervonenkis dimension. J. Combin. Theory Ser. A 69 217--232.
  • Haussler, D., Littlestone, N. and Warmuth, M. (1994). Predicting $\ 0,1\$-functions on randomly drawn points. Inform. and Comput. 115 248--292.
  • Koltchinskii, V. I. (1981). On the central limit theorem for empirical measures. Theor. Probab. Math. Statist. 24 71--82.
  • Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82. Springer, New York.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Isoperimetry and Processes. Springer, Berlin.
  • Lugosi, G. (2002). Pattern classification and learning theory. In Principles of Nonparametric Learning (L. Györfi, ed.) 1--56. Springer, Vienna.
  • Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
  • Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. (6) 9 245--303.
  • Massart, P. (2006). Concentration inequalities and model selection. Lectures on Probability Theory and Statistics. Ecole d'Eté de Probabilités de Saint Flour XXXIII. Lecture Notes in Math. 1896. Springer, Berlin. To appear.
  • Massart, P. and Rio, E. (1998). A uniform Marcinkiewicz--Zygmund strong law of large numbers for empirical processes. In Festschrift for Miklós Csörgő: Asymptotic Methods in Probability and Statistics (B. Szyszkowicz, ed.) 199--211. North-Holland, Amsterdam.
  • McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989 (J. Siemons, ed.) 148--188. Cambridge Univ. Press.
  • Pollard, D. (1982). A central limit theorem for empirical processes. J. Austral. Math. Soc. Ser. A 33 235--248.
  • Reynaud-Bouret, P. (2003). Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields 126 103--153.
  • Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505--563.
  • Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135--166.
  • Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer, New York.
  • Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Nauka, Moscow. (In Russian.)
  • Yang, Y. and Barron, A. R. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.
  • Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423--435. Springer, New York.