## The Annals of Statistics

### Population theory for boosting ensembles

Leo Breiman

#### Abstract

Tree ensembles are looked at in distribution space, that is, the limit case of "infinite" sample size. It is shown that the simplest kind of trees is complete in D-dimensional $L_2(P)$ space if the number of terminal nodes T is greater than D. For such trees we show that the AdaBoost algorithm gives an ensemble converging to the Bayes risk.

#### Article information

Source
Ann. Statist., Volume 32, Number 1 (2004), 1-11.

Dates
First available in Project Euclid: 12 March 2004

https://projecteuclid.org/euclid.aos/1079120126

Digital Object Identifier
doi:10.1214/aos/1079120126

Mathematical Reviews number (MathSciNet)
MR2050998

Zentralblatt MATH identifier
1105.62308

Keywords

#### Citation

Breiman, Leo. Population theory for boosting ensembles. Ann. Statist. 32 (2004), no. 1, 1--11. doi:10.1214/aos/1079120126. https://projecteuclid.org/euclid.aos/1079120126

#### References

• Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36 105--139.
• Breiman, L. (1996). Bagging predictors. Machine Learning 24 123--140.
• Breiman, L. (1997). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California, Berkeley. Available at www.stat.berkeley.edu.
• Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801--849.
• Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493--1517.
• Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Dept. Statistics, Univ. California, Berkeley.
• Bühlmann, P. and Yu, B. (2003). Boosting with the $L_2$ loss: Regression and classification. J. Amer. Statist. Assoc. 98 324--339.
• Dietterich, T. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40 139--157.
• Drucker, H. and Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 479--485. MIT Press, Cambridge, MA.
• Dunford, N. and Schwartz, J. (1958). Linear Operators. I. Interscience Publishers, New York.
• Forsythe, G. E. and Wasow, W. R. (1960). Finite-Difference Methods for Partial Differential Equations. Wiley, New York.
• Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning 148--156. Morgan Kaufmann, San Francisco.
• Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
• Jiang, W. (2004). Process consistency for AdaBoost. Ann. Statist. 32 13--29.
• Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 30--55.
• Mannor, S., Meir, R. and Zhang, T. (2002). The consistency of greedy algorithms for classification. In Proc. 15th Annual Conference on Computational Learning Theory. Lecture Notes in Comp. Sci. 2375 319--333. Springer, New York.
• Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
• Schapire, R. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 297--336.
• Wheway, V. (1999). Variance reduction trends on boosted'' classifiers. Unpublished manuscript.
• Zhang, T. and Yu, B. (2003). Boosting with early stopping: Convergence and consistency. Technical Report 635, Dept. Statistics, Univ. California, Berkeley. Available from www.stat.berkeley.edu/~binyu/publications.html.