The Annals of Statistics

Population theory for boosting ensembles

Leo Breiman

Full-text: Open access


Tree ensembles are looked at in distribution space, that is, the limit case of "infinite" sample size. It is shown that the simplest kind of trees is complete in D-dimensional $L_2(P)$ space if the number of terminal nodes T is greater than D. For such trees we show that the AdaBoost algorithm gives an ensemble converging to the Bayes risk.

Article information

Ann. Statist., Volume 32, Number 1 (2004), 1-11.

First available in Project Euclid: 12 March 2004

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30} 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Trees AdaBoost Bayes risk


Breiman, Leo. Population theory for boosting ensembles. Ann. Statist. 32 (2004), no. 1, 1--11. doi:10.1214/aos/1079120126.

Export citation


  • Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36 105--139.
  • Breiman, L. (1996). Bagging predictors. Machine Learning 24 123--140.
  • Breiman, L. (1997). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California, Berkeley. Available at
  • Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801--849.
  • Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493--1517.
  • Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Dept. Statistics, Univ. California, Berkeley.
  • Bühlmann, P. and Yu, B. (2003). Boosting with the $L_2$ loss: Regression and classification. J. Amer. Statist. Assoc. 98 324--339.
  • Dietterich, T. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40 139--157.
  • Drucker, H. and Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 479--485. MIT Press, Cambridge, MA.
  • Dunford, N. and Schwartz, J. (1958). Linear Operators. I. Interscience Publishers, New York.
  • Forsythe, G. E. and Wasow, W. R. (1960). Finite-Difference Methods for Partial Differential Equations. Wiley, New York.
  • Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning 148--156. Morgan Kaufmann, San Francisco.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
  • Jiang, W. (2004). Process consistency for AdaBoost. Ann. Statist. 32 13--29.
  • Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 30--55.
  • Mannor, S., Meir, R. and Zhang, T. (2002). The consistency of greedy algorithms for classification. In Proc. 15th Annual Conference on Computational Learning Theory. Lecture Notes in Comp. Sci. 2375 319--333. Springer, New York.
  • Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
  • Schapire, R. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 297--336.
  • Wheway, V. (1999). Variance reduction trends on ``boosted'' classifiers. Unpublished manuscript.
  • Zhang, T. and Yu, B. (2003). Boosting with early stopping: Convergence and consistency. Technical Report 635, Dept. Statistics, Univ. California, Berkeley. Available from