## The Annals of Statistics

### Estimating the algorithmic variance of randomized ensembles via the bootstrap

Miles E. Lopes

#### Abstract

Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is “large enough”—so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of “algorithmic variance” (i.e., the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable $\mathrm{ERR}_{t}$ denote the prediction error of a randomized ensemble of size $t$. Working under a “first-order model” for randomized ensembles, we prove that the centered law of $\mathrm{ERR}_{t}$ can be consistently approximated via the proposed method as $t\to\infty$. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of $\mathrm{ERR}_{t}$ are negligible.

#### Article information

Source
Ann. Statist., Volume 47, Number 2 (2019), 1088-1112.

Dates
Revised: February 2018
First available in Project Euclid: 11 January 2019

https://projecteuclid.org/euclid.aos/1547197249

Digital Object Identifier
doi:10.1214/18-AOS1707

Mathematical Reviews number (MathSciNet)
MR3909961

Zentralblatt MATH identifier
07033162

#### Citation

Lopes, Miles E. Estimating the algorithmic variance of randomized ensembles via the bootstrap. Ann. Statist. 47 (2019), no. 2, 1088--1112. doi:10.1214/18-AOS1707. https://projecteuclid.org/euclid.aos/1547197249

#### References

• Arlot, S. and Genuer, R. (2014). Analysis of purely random forests bias. arXiv:1407.3939.
• Biau, G. (2012). Analysis of a random forests model. J. Mach. Learn. Res. 13 1063–1095.
• Biau, G., Devroye, L. and Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9 2015–2033.
• Bickel, P. J. and Yahav, J. A. (1988). Richardson extrapolation and the bootstrap. J. Amer. Statist. Assoc. 83 387–393.
• Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123–140.
• Breiman, L. (2001). Random forests. Mach. Learn. 45 5–32.
• Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA.
• Brezinski, C. and Zaglia, M. R. (2013). Extrapolation Methods: Theory and Practice. North-Holland, Amsterdam.
• Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927–961.
• Buja, A. and Stuetzle, W. (2000). Smoothing effects of bagging. Preprint, AT&T Labs-Research, Florham Park, NJ.
• Buja, A. and Stuetzle, W. (2006). Observations on bagging. Statist. Sinica 16 323–351.
• Bürgisser, P. and Cucker, F. (2013). Condition: The Geometry of Numerical Algorithms. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 349. Springer, Heidelberg.
• Byrd, R. H., Chin, G. M., Nocedal, J. and Wu, Y. (2012). Sample size selection in optimization methods for machine learning. Math. Program. 134 127–155.
• Cannings, T. I. and Samworth, R. J. (2017). Random-projection ensemble classification. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 959–1035. With discussions and a reply by the authors.
• Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40 139–157.
• Efron, B. (2014). Estimation and accuracy after model selection. J. Amer. Statist. Assoc. 109 991–1007.
• Genuer, R. (2012). Variance reduction in purely random forests. J. Nonparametr. Stat. 24 543–562.
• Hall, P. and Samworth, R. J. (2005). Properties of bagged nearest neighbour classifiers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 363–379.
• Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
• Hernández-Lobato, D., Martínez-Muñoz, G. and Suárez, A. (2013). How large should ensembles of classifiers be? Pattern Recognit. 46 1323–1336.
• Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20 832–844.
• Kallenberg, O. (2006). Foundations of Modern Probability. Springer, Berlin.
• Lam, L. and Suen, C. Y. (1997). Application of majority voting to pattern recognition: An analysis of its behavior and performance. IEEE Trans. Syst. Man Cybern., Part A, Syst. Hum. 27 553–568.
• Latinne, P., Debeir, O. and Decaestecker, C. (2001). Limiting the number of trees in random forests. In Multiple Classifier Systems (Cambridge, 2001). Lecture Notes in Computer Science 2096 178–187. Springer, Berlin.
• Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest. R News 2 18–22.
• Lichman, M. (2013). UCI machine learning repository.
• Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc. 101 578–590.
• Lopes, M. E. (2016). A sharp bound on the computation-accuracy tradeoff for majority voting ensembles. Preprint, arXiv:1303.0727.
• Lopes, M. E. (2019). Supplement to “Estimating the algorithmic variance of randomized ensembles via the bootstrap.” DOI:10.1214/18-AOS1707SUPP.
• Lopes, M. E., Wang, S. and Mahoney, M. W. (2017). A bootstrap method for error estimation in randomized matrix multiplication. Preprint, arXiv:1708.01945.
• Lopes, M. E., Wang, S. and Mahoney, M. W. (2018). Error estimation for randomized least-squares algorithms via the bootstrap. In Proceedings of the 35th International Conference on Machine Learning 80 3217–3226.
• Mentch, L. and Hooker, G. (2016). Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J. Mach. Learn. Res. 17 Paper No. 26, 41.
• Ng, A. Y. and Jordan, M. I. (2001). Convergence rates of the Voting Gibbs classifier, with application to Bayesian feature selection. In Proceedings of the 18th International Conference on Machine Learning 377–384.
• Oshiro, T. M., Perez, P. S. and Baranauskas, J. A. (2012). How many trees in a random forest? In Machine Learning and Data Mining in Pattern Recognition 154–168. Springer, Berlin.
• Schapire, R. E. and Freund, Y. (2012). Boosting: Foundations and Algorithms. MIT Press, Cambridge, MA.
• Scornet, E. (2016a). On the asymptotics of random forests. J. Multivariate Anal. 146 72–83.
• Scornet, E. (2016b). Random forests and kernel methods. IEEE Trans. Inform. Theory 62 1485–1500.
• Scornet, E., Biau, G. and Vert, J.-P. (2015). Consistency of random forests. Ann. Statist. 43 1716–1741.
• Sexton, J. and Laake, P. (2009). Standard errors for bagged and random forest estimators. Comput. Statist. Data Anal. 53 801–811.
• Sidi, A. (2003). Practical Extrapolation Methods: Theory and Applications. Cambridge Monographs on Applied and Computational Mathematics 10. Cambridge Univ. Press, Cambridge.
• Simon, L. (1983). Lectures on Geometric Measure Theory. Proceedings of the Centre for Mathematical Analysis, Australian National University 3. Australian National Univ., Centre for Mathematical Analysis, Canberra.
• van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
• Wager, S., Hastie, T. and Efron, B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. J. Mach. Learn. Res. 15 1625–1651.
• White, B. (2016). Introduction to minimal surface theory. In Geometric Analysis. IAS/Park City Math. Ser. 22 387–438. Amer. Math. Soc., Providence, RI.

#### Supplemental materials

• Supplement: Supplementary Material for “Estimating the algorithmic variance of randomized ensembles via the bootstrap”. The supplement contains proofs of all theoretical results, and an assessment of technical assumptions.