Institute of Mathematical Statistics Collections
A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing
In the spirit of modeling inference for microarrays as multiple testing for sparse mixtures, we present a similar approach to a simplified version of quantitative trait loci (QTL) mapping. Unlike in case of microarrays, where the number of tests usually reaches tens of thousands, the number of tests performed in scans for QTL usually does not exceed several hundreds. However, in typical cases, the sparsity p of significant alternatives for QTL mapping is in the same range as for microarrays. For methodological interest, as well as some related applications, we also consider non-sparse mixtures. Using simulations as well as theoretical observations we study false discovery rate (FDR), power and misclassification probability for the Benjamini-Hochberg (BH) procedure and its modifications, as well as for various parametric and nonparametric Bayes and Parametric Empirical Bayes procedures. Our results confirm the observation of Genovese and Wasserman (2002) that for small p the misclassification error of BH is close to optimal in the sense of attaining the Bayes oracle. This property is shared by some of the considered Bayes testing rules, which in general perform better than BH for large or moderate p’s.
First available in Project Euclid: 1 April 2008
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Copyright © 2008, Institute of Mathematical Statistics
Bogdan, Małgorzata; Ghosh, Jayanta K.; Tokdar, Surya T. A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen, 211--230, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008. doi:10.1214/193940307000000158. https://projecteuclid.org/euclid.imsc/1207058275
-  Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
-  Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery fate in multiple testing with independent statistics. J. Educ. Behav. Stat. 25 60–83.
-  Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
-  Bogdan, M., Ghosh, J. K. and Doerge, R. W. (2004). Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitive trait loci. Genetics 167 989–999.
-  Chen, J. and Sarkar, S. K. (2004). Multiple testing of response rates with a control: A Bayesian stepwise approach. J. Statist. Plann. Inference 125 3–16.
-  Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
-  Efron, B. and Tibshirani, R. (2002). Empirical bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23 70–86.
-  Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
-  Elmore, R., Hall, P. and Neeman, A. (2005). An application of classical invariant theory to identifiability in nonparametric mixtures. Ann. Inst. Fourier (Grenoble) 55 1–28.
-  Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
-  Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
-  Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499–517.
-  Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
-  Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. Springer, New York.
-  Ghosh, J. K. and Sen, P. K. (1985). On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer II (Berkeley, Calif., 1983) 789–806. Wadsworth, Belmont, CA.
-  Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 65–70.
-  Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate. Ann. Statist. 33 1138–1154.
-  Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393.
-  Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990–1001.
-  Müller, P., Parmigiani, G. and Rice, K. (2007). FDR and Bayesian multiple comparison. In Bayesian Statistics 8 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.). Oxford Univ. Press.
-  Newton, M. A. (2002). On a nonparametric recursive estimator of the mixing distribution. Sankhyā Ser. A 64 306–322.
-  Otto, S. P. and Jones, C. D. (2000). Detecting the undetected: Estimating the total number of loci underlying a quantitative trait. Genetics 156 2093–2107.
-  Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257.
-  Sarkar, S. K. (2006). False discovery and false nondiscovery rates in single-step multiple testing procedures. Ann. Statist. 34 394–415.
-  Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
-  Seeger, P. (1968). A note on a method for the analysis of significance en masse. Technometrics 10 586–593.
-  Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751–754.
-  Sorić, B. (1989). Statistical “discoveries” and effect-size estimation. J. Amer. Statist. Assoc. 84 608–610.
-  Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
-  Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035.
-  Storey, J. D. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347–368.
-  Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
-  Yi, N. (2004). A unified markov chain monte carlo framework for mapping multiple quantitative trait loci. Genetics 167 967–975.