The Annals of Statistics

Balanced control of generalized error rates

Joseph P. Romano and Michael Wolf

Full-text: Open access


Consider the problem of testing s hypotheses simultaneously. In this paper, we derive methods which control the generalized family-wise error rate given by the probability of k or more false rejections, abbreviated k-FWER. We derive both single-step and step-down procedures that control the k-FWER in finite samples or asymptotically, depending on the situation. Moreover, the procedures are asymptotically balanced in an appropriate sense. We briefly consider control of the average number of false rejections. Additionally, we consider the false discovery proportion (FDP), defined as the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). Here, the goal is to construct methods which satisfy, for given γ and α, P{FDP>γ}≤α, at least asymptotically. Special attention is paid to the construction of methods which implicitly take into account the dependence structure of the individual test statistics in order to further increase the ability to detect false null hypotheses. A general resampling and subsampling approach is presented which achieves these objectives, at least asymptotically.

Article information

Ann. Statist., Volume 38, Number 1 (2010), 598-633.

First available in Project Euclid: 31 December 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J15: Paired and multiple comparisons
Secondary: 62G10: Hypothesis testing

Bootstrap false discovery proportion generalized family-wise error rate multiple testing step-down procedure


Romano, Joseph P.; Wolf, Michael. Balanced control of generalized error rates. Ann. Statist. 38 (2010), no. 1, 598--633. doi:10.1214/09-AOS734.

Export citation


  • [1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • [2] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491–507.
  • [3] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • [4] Beran, R. (1986). Simulated power functions. Ann. Statist. 14 151–173.
  • [5] Beran, R. (1988). Balanced simultaneous confidence sets. J. Amer. Statist. Assoc. 83 679–686.
  • [6] Beran, R. (1988). Prepivoting test statistics: A bootstrap view of asymptotic refinements. J. Amer. Statist. Assoc. 83 687–697.
  • [7] Davison, A. C. and Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge Univ. Press, Cambridge.
  • [8] Dudoit, S., Gilbert, H. and van der Laan, M. J. (2008). Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: Focus on the false discovery rate and simulation study. Biom. J. 50 716–744.
  • [9] Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103.
  • [10] Dudoit, S., van der Laan, M. J. and Pollard, K. S. (2004). Multiple testing. I. Single-step procedures for control of general type I error rates. Stat. Appl. Genet. Mol. Biol. 3 71. Available at
  • [11] Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall, New York.
  • [12] Genovese, C. R. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
  • [13] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
  • [14] Hall, P. and Wilson, S. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics 47 757–762.
  • [15] Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 65–70.
  • [16] Hommel, G. and Hoffman, T. (1988). Controlled uncertainty. In Multiple Hyptheses Testing (P. Bauer, G. Hommel and E. Sonnemann, eds.) 154–161. Springer, Heidelberg.
  • [17] Korn, E. L., Troendle, J. F., McShane, L. M. and Simon, R. (2004). Controlling the number of false discoveries: Application to high-dimensional genomic data. J. Statist. Plann. Inference 124 379–398.
  • [18] Lahiri, S. N. (2003). Resampling Methods for Dependent Data. Springer, New York.
  • [19] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the family-wise error rate. Ann. Statist. 33 1138–1154.
  • [20] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • [21] Perone Pacifico, M., Genovese, C. R., Verdinelli, I. and Wasserman, L. (2004). False discovery control for random fields. J. Amer. Statist. Assoc. 99 1002–1014.
  • [22] Politis, D. N., Romano, J. P. and Wolf, M. (1999). Subsampling. Springer, New York.
  • [23] Pollard, K. S. and van der Laan, M. J. (2003). Multiple testing for gene expression data: An investigation of null distributions with consequences for the permutation test. In Proceedings of the 2003 International MultiConference in Computer Science and Engineering, METMBS’03 Conference 3–9.
  • [24] Rogers, J. and Hsu, J. (2001). Multiple comparisons of biodiversity. Biom. J. 43 617–625.
  • [25] Romano, J. P. (1988). A bootstrap revival of some nonparametric distance tests. J. Amer. Statist. Assoc. 83 698–708.
  • [26] Romano, J. P. and Shaikh, A. M. (2006). On step-down control of the false discovery proportion. In 2nd Lehmann Symposium—Optimality (J. Rojo, ed.). Institute of Mathematical Statistics Lecture Notes—Monograph Series. 49 33–50. Inst. Math. Statist., Beachwood, OH.
  • [27] Romano, J. P. and Shaikh, A. M. (2006). Stepup procedures for control of generalizations of the family-wise error rate. Ann. Statist. 34 1850–1873.
  • [28] Romano, J. P., Shaikh, A. M. and Wolf, M. (2008). Control of the false discovery rate under dependence using the bootstrap and subsampling (with discussion). Test 17 417–442.
  • [29] Romano, J. P., Shaikh, A. M. and Wolf, M. (2008). Formalized data snooping based on generalized error rates. Econometric Theory 24 404–447.
  • [30] Romano, J. P. and Wolf, M. (2005). Exact and approximate step-down methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94–108.
  • [31] Romano, J. P. and Wolf, M. (2007). Control of generalized error rates in multiple testing. Ann. Statist. 35 1378–1408.
  • [32] Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257.
  • [33] Shao, J. and Tu, D. (1995). The Jackknife and the Bootstrap. Springer, New York.
  • [34] Spjøtvoll, E. (1972). On the optimality of some multiple comparison procedures. Ann. Math. Statist. 43 398–411.
  • [35] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
  • [36] Troendle, J. F. (2000). Stepwise normal theory test procedures controlling the false discovery rate. J. Statist. Plann. Inference 84 139–158.
  • [37] Troendle, J. F. (2008). Comment on “Control of the false discovery rate under dependence using the bootstrap and subsampling,” by J. Romano, A. Shaikh and M. Wolf. Test 17 456–457.
  • [38] Tu, W. and Zhou, X. (2000). Pairwise comparison of the means of skewed data. J. Statist. Plann. Inference 88 59–74.
  • [39] Van der Laan, M. J., Birkner, M. D. and Hubbard, A. E. (2005). Empirical Bayes and resampling based multiple testing procedure controlling tail probability of the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 4 32. Available at
  • [40] Van der Laan, M. J., Dudoit, S. and Pollard, K. S. (2004). Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 3 27. Available at
  • [41] Westfall, P. H. and Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley, New York.
  • [42] Yekutieli, D. and Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Statist. Plann. Inference 82 171–196.