The Annals of Statistics

Size, power and false discovery rates

Bradley Efron

Full-text: Open access


Modern scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on large-scale problems. A simple empirical Bayes approach allows the false discovery rate (fdr) analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tail-area fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology, the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant” discoveries.

Article information

Ann. Statist., Volume 35, Number 4 (2007), 1351-1377.

First available in Project Euclid: 29 August 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators 62G07: Density estimation

local false discovery rates empirical Bayes large-scale simultaneous inference empirical null


Efron, Bradley. Size, power and false discovery rates. Ann. Statist. 35 (2007), no. 4, 1351--1377. doi:10.1214/009053606000001460.

Export citation


  • Allison, D., Gadbury, G., Heo, M., Fernández, J., Lee, C.-K., Prolla, T. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal. 39 1–20.
  • Aubert, J., Bar-Hen, A., Daudin, J. and Robin, S. (2004). Determination of the differentially expressed genes in microarray experiments using local FDR. BMC Bioinformatics 5 125.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Broberg, P. (2004). A new estimate of the proportion unchanged genes in a microarray experiment. Genome Biology 5 (5) P10.
  • Do, K.-A., Müller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. Appl. Statist. 54 627–644.
  • Dudoit, S., Shaffer, J. and Boldrick, J. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103.
  • Dudoit, S., van der Laan, M. and Pollard, K. (2004). Multiple testing. I. Single-step procedures for the control of general type I error rates. Stat. Appl. Genet. Mol. Biol. \bf3 article 13. Available at
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2005). Local false discovery rates. Available at
  • Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
  • Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208–256. IMS, Beachwood, OH.
  • Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Statist. 24 2431–2461.
  • Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genetic Epidemiology \bf23 70–86.
  • Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
  • Gottardo, R., Raftery, A., Yee Yeung, K. and Bumgarner, R. (2006). Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62 10–18.
  • Heller, G. and Qing, J. (2003). A mixture model approach for finding informative genes in microarray studies. Unpublished manuscript.
  • Johnstone, I. and Silverman, B. (2004). Needles and straw in haystacks: Empirical Bayes estimates of sparse sequences. Ann. Statist. 32 1594–1649.
  • Kendziorski, C., Newton, M., Lan, H. and Gould, M. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22 3899–3914.
  • Kerr, M., Martin, M. and Churchill, G. (2000). Analysis of variance for gene expression microarray data. J. Comput. Biol. 7 819–837.
  • Langaas, M., Lindqvist, B. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 555–572.
  • Lee, M.-L. T., Kuo, F., Whitmore, G. and Sklar, J. (2000). Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97 9834–9839.
  • Liao, J., Lin, Y., Selvanayagam, Z. and Weichung, J. (2004). A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics 20 2694–2701.
  • Lindsey, J. (1974). Comparison of probability distributions. J. Roy. Statist. Soc. Ser. B \bf36 38–47.
  • Lindsey, J. (1974). Construction and comparison of statistical models. J. Roy. Statist. Soc. Ser. B 36 418–425.
  • Newton, M., Kendziorski, C., Richmond, C., Blattner, F. and Tsui, K. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37–52.
  • Newton, M., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture model. Biostatistics 5 155–176.
  • Pan, W., Lin, J. and Le, C. (2003). A mixture model approach to detecting differentially expressed genes with microarray data. Functional and Integrative Genomics 3 117–124.
  • Pawitan, Y., Michiels, S., Koscielny, S., Gusnanto, A. and Ploner, A. (2005). False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics 21 3017–3024.
  • Pounds, S. and Morris, S. (2003). Estimating the occurrence of false positions and false negatives in microarray studies by approximating and partitioning the empirical distribution of $p$-values. Bioinformatics 19 1236–1242.
  • Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D'Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P., Golub, T. and Sellers, R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209.
  • Storey, J. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • Storey, J., Taylor, J. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–206.
  • van't Wout, A., Lehrman, G., Mikheeva, S., O'Keeffe, G. Katze, M., Bumgarner, R., Geiss, G. and Mullins, J. (2003). Cellular gene expression upon human immunodeficiency virus type 1 infection of CD4$^+$-T-cell lines. J. Virology 77 1392–1402.