Bernoulli

  • Bernoulli
  • Volume 17, Number 1 (2011), 347-394.

Simultaneous critical values for $t$-tests in very high dimensions

Hongyuan Cao and Michael R. Kosorok

Full-text: Open access

Abstract

This article considers the problem of multiple hypothesis testing using $t$-tests. The observed data are assumed to be independently generated conditional on an underlying and unknown two-state hidden model. We propose an asymptotically valid data-driven procedure to find critical values for rejection regions controlling the $k$-familywise error rate ($k$-FWER), false discovery rate (FDR) and the tail probability of false discovery proportion (FDTP) by using one-sample and two-sample $t$-statistics. We only require a finite fourth moment plus some very general conditions on the mean and variance of the population by virtue of the moderate deviations properties of $t$-statistics. A new consistent estimator for the proportion of alternative hypotheses is developed. Simulation studies support our theoretical results and demonstrate that the power of a multiple testing procedure can be substantially improved by using critical values directly, as opposed to the conventional $p$-value approach. Our method is applied in an analysis of the microarray data from a leukemia cancer study that involves testing a large number of hypotheses simultaneously.

Article information

Source
Bernoulli, Volume 17, Number 1 (2011), 347-394.

Dates
First available in Project Euclid: 8 February 2011

Permanent link to this document
https://projecteuclid.org/euclid.bj/1297173846

Digital Object Identifier
doi:10.3150/10-BEJ272

Mathematical Reviews number (MathSciNet)
MR2797995

Zentralblatt MATH identifier
1284.62469

Keywords
empirical processes FDR high dimension microarrays multiple hypothesis testing one-sample $t$-statistics self-normalized moderate deviation two-sample $t$-statistics

Citation

Cao, Hongyuan; Kosorok, Michael R. Simultaneous critical values for $t$-tests in very high dimensions. Bernoulli 17 (2011), no. 1, 347--394. doi:10.3150/10-BEJ272. https://projecteuclid.org/euclid.bj/1297173846


Export citation

References

  • [1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • [2] Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educ. Behav. Stat. 25 60–83.
  • [3] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • [4] Cao, H. (2007). Moderate deviations for two sample t-statistics. ESAIM Probab. Stat. 11 264–271.
  • [5] Chen, L.H.Y. and Shao, Q.M. (2007). Normal approximation for nonlinear statistics using a concentration inequality approach. Bernoulli 13 581–599.
  • [6] Chi, Z. (2007). On the performance of FDR control: Constraints and a partial solution. Ann. Statist. 35 1409–1431.
  • [7] Chi, Z. and Tan, Z. (2008). Positive false discovery proportions: Intrinsic bounds and adaptive control. Statist. Sinica 18 837–860.
  • [8] Craiu, R. and Sun, L. (2008). Choosing the lesser evil: Trade-off between false discovery rate and non-discovery rate. Statist. Sinica 18 861–879.
  • [9] Dudoit, S. and van der Laan, M.J. (2008). Multiple Testing Procedures with Applications to Genomics. New York: Springer.
  • [10] Dudley, R.M. and Philipp, W. (1983). Invariance principles for sums of Banach space valued random elements and empirical processes. Z. Wahrsch. Verw. Gebiete 62 509–552.
  • [11] Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V.G. (2001). Empirical bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • [12] Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be appplied? J. Amer. Statist. Assoc. 102 1282–1288.
  • [13] Genoves, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
  • [14] Golub, T.R. et al. (1999). Molecular classifcation of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.
  • [15] Kosorok, M. (2008). Introduction to Empirical Processes and Semiparametric Inference. New York: Springer.
  • [16] Kosorok, M. and Ma, S. (2007). Marginal asymptotics for the “large p, small n” paradigm: With application to microarray data. Ann. Statist. 35 1456–1486.
  • [17] Langaas, M. and Lindqvist, B. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. Roy. Statist. Soc. Ser. B 67 555–572.
  • [18] Lehmann, E.L. and Romano, J.P. (2005). Generalizations of the familywise error rate. Ann. Statist. 33 1138–1154.
  • [19] Meinshausen, N and Bühlmann, P. (2005). Lower bounds for the number of false null hypotheses for multiple testing of associations. Biometrika 92 893–907.
  • [20] Meinshausen, N and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393.
  • [21] Storey, J. (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 64 479–498.
  • [22] Storey, J. (2003). The positive false discoery rate: A bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035.
  • [23] Storey, J. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100 9440–9445.
  • [24] Storey, J., Taylor, J. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. Roy. Statist. Soc. Ser. B 66 187–205.
  • [25] Sun, W. and Cai, T. (2009). Large-scale multiple testing under dependencey. J. Roy. Statist. Soc. Ser. B 71 393–424.
  • [26] van der Laan, M.J., Dudoit, S. and Pollard, K.S. (2004). Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 3: Article 15 (electronic).
  • [27] Wang, Q. and Hall, P. (2009). Relative errors in central limit theorem for Student’s t statistics with applications. Statist. Sinica 19 343–354.
  • [28] Wang, Q. (2008). Absolute and relative errors in central limit theorem for self-normalized sums: Review and new results. Unpublished manuscript.
  • [29] Wu, W. (2008). On false discovery control under dependence. Ann. Statist. 36 364–380.