## The Annals of Applied Statistics

### Bootstrapping data arrays of arbitrary order

#### Abstract

In this paper we study a bootstrap strategy for estimating the variance of a mean taken over large multifactor crossed random effects data sets. We apply bootstrap reweighting independently to the levels of each factor, giving each observation the product of independently sampled factor weights. No exact bootstrap exists for this problem [McCullagh (2000) Bernoulli 6 285–301]. We show that the proposed bootstrap is mildly conservative, meaning biased toward overestimating the variance, under sufficient conditions that allow very unbalanced and heteroscedastic inputs. Earlier results for a resampling bootstrap only apply to two factors and use multinomial weights that are poorly suited to online computation. The proposed reweighting approach can be implemented in parallel and online settings. The results for this method apply to any number of factors. The method is illustrated using a $3$ factor data set of comment lengths from Facebook.

#### Article information

Source
Ann. Appl. Stat., Volume 6, Number 3 (2012), 895-927.

Dates
First available in Project Euclid: 31 August 2012

https://projecteuclid.org/euclid.aoas/1346418567

Digital Object Identifier
doi:10.1214/12-AOAS547

Mathematical Reviews number (MathSciNet)
MR3012514

Zentralblatt MATH identifier
06096515

#### Citation

Owen, Art B.; Eckles, Dean. Bootstrapping data arrays of arbitrary order. Ann. Appl. Stat. 6 (2012), no. 3, 895--927. doi:10.1214/12-AOAS547. https://projecteuclid.org/euclid.aoas/1346418567

#### References

• Bennett, J. and Lanning, S. (2007). The Netflix prize. In Proceedings of KDD Cup and Workshop 2007 35. ACM, New York.
• Brennan, R. L., Harris, D. J. and Hanson, B. A. (1987). The bootstrap and other procedures for examining the variability of estimated variance components. Technical report, ACT.
• Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
• Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
• Lee, H. K. H. and Clyde, M. A. (2004). Lossless online Bayesian bagging. J. Mach. Learn. Res. 5 143–151.
• Mammen, E. (1992). When Does Bootstrap Work. Lecture Notes in Statistics 77. Springer, New York.
• Mammen, E. (1993). Bootstrap and wild bootstrap for high-dimensional linear models. Ann. Statist. 21 255–285.
• McCarthy, P. J. (1969). Pseudo-replication: Half samples. Review of the International Statistical Institute 37 239–264.
• McCullagh, P. (2000). Resampling and exchangeable arrays. Bernoulli 6 285–301.
• Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. J. Roy. Statist. Soc. Ser. B 56 3–48.
• Owen, A. B. (2007). The pigeonhole bootstrap. Ann. Appl. Stat. 1 386–411.
• Oza, N. and Russell, S. (2001). Online bagging and boosting. In Artificial Intelligence and Statistics 2001 105–112. Morgan Kaufmann, San Mateo, CA.
• Rubin, D. B. (1981). The Bayesian bootstrap. Ann. Statist. 9 130–134.
• Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.
• Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P. and Murthy, R. (2009). Hive: A warehousing solution over a map-reduce framework. In Proceedings of the VLDB Endowment, Vol. 2 1626–1629. VLDB Endowment.
• Wiley, E. W. (2001). Bootstrap strategies for variance component estimation: Theoretical and empirical results. Ph.D. thesis, Stanford Univ.