Institute of Mathematical Statistics Collections

Multiple testing procedures under confounding

Debashis Ghosh

Full-text: Open access


While multiple testing procedures have been the focus of much statistical research, an important facet of the problem is how to deal with possible confounding. Procedures have been developed by authors in genetics and statistics. In this chapter, we relate these proposals. We propose two new multiple testing approaches within this framework. The first combines sensitivity analysis methods with false discovery rate estimation procedures. The second involves construction of shrinkage estimators that utilize the mixture model for multiple testing. The procedures are illustrated with applications to a gene expression profiling experiment in prostate cancer.

Chapter information

N. Balakrishnan, Edsel A. Peña and Mervyn J. Silvapulle, eds., Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2008), 243-256

First available in Project Euclid: 1 April 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62P10: Applications to biology and medical sciences
Secondary: 92D10: Genetics {For genetic algebras, see 17D92}

association studies empirical null hypothesis multiple comparisons statistical genomics

Copyright © 2008, Institute of Mathematical Statistics


Ghosh, Debashis. Multiple testing procedures under confounding. Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen, 243--256, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008. doi:10.1214/193940307000000176.

Export citation


  • [1] Abecasis, G. R., Ghosh, D. and Nichols, T. E. (2005). Linkage disequilibrium: Ancient history drives the new genetics. Human Heredity 59 118–124.
  • [2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • [3] Benjamini, Y. and Liu, W. (1999). A step-down multiple hypothesis testing procedure that controls the false discovery rate under independence. J. Statist. Plann. Inference 82 163–170.
  • [4] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • [5] Benjamini, Y. and Yekutieli, D. (2005). False discovery rate controlling confidence intervals for selected parameters (with discussion). J. Amer. Statist. Assoc. 100 71–80.
  • [6] Bhattacharya, S., Long, D. and Lyons-Weiler, J. (2003). Overcoming confounded controls in the analysis of gene expression data from microarray experiments. Applied Bioinformatics 2 197–208.
  • [7] Cardon, L. and Bell, J. (2001). Association study designs for complex diseases. Nature Reviews Genetics 2 91–99.
  • [8] Dalmasso, C., Broët, P. and Moreau, T. (2005). A simple procedure for estimating the false discovery rate. Bioinformatics 21 660–668.
  • [9] Datta, S. and Datta, S. (2005). Empirical Bayes screening of many p-values with applications to microarray studies. Bioinformatics 21 1987–1994.
  • [10] Devlin, B. and Roeder, K. (1999). Genomic control for association studies. Biometrics 55 997–1004.
  • [11] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • [12] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • [13] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. Roy. Statist. Soc. Ser. B 64 499–517.
  • [14] Genovese, C. and Wasserman, L. (2004). A stochastic approach to false discovery control. Ann. Statist. 32 1035–1061.
  • [15] George, E. I. (1986). Minimax multiple shrinkage estimation. Ann. Statist. 14 188–205.
  • [16] Ghosh, D. (2006). Shrunken p-values for assessing differential expression, with applications to genomic data analysis. Biometrics 59 1099–1106.
  • [17] Ghosh, D. and Chinnaiyan, A. M. (2005). Covariate adjustment in the analysis of microarray data from clinical studies. Functional and Integrative Genomics 5 18–27.
  • [18] James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 361–380. Univ. California Press, Berkeley.
  • [19] Lin, D. Y., Kronmal, R. A. and Psaty, B. M. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54 948–963.
  • [20] Lindsey, J. K. (1974). Comparison of probability distributions. J. Roy. Statist. Soc. Ser. B 36 38–47.
  • [21] Prentice, R. L. and Qi, L. (2006). Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation. Biostatistics 7 339–354.
  • [22] Pounds, S. and Cheng, C. (2004). Improving false discovery rate estimation. Bioinformatics 20 1737–1745.
  • [23] Pritchard, J. K. and Rosenberg, N. A. (1999). Use of unlinked genetic markers to detect population stratification in association studies. Amer. J. Human Genetics 65 220–228.
  • [24] Sabatti, C., Service, S. and Freimer, N. (2003). False discovery rate in linkage and association genome screens for complex disorders. Genetics 164 829–833.
  • [25] Sarkar, S. (2002). Some results on false discovery rates in multiple testing procedures. Ann. Statist. 30 239–257.
  • [26] Sen, P. K. and Saleh, A. K. (1985). On some shrinkage estimators of multivariate location. Ann. Statist. 13 272–281.
  • [27] Sen, P. K. and Saleh, A. K. (1987). On preliminary test and shrinkage M-estimation in linear models. Ann. Statist. 15 1580–1592.
  • [28] Storey, J. D. (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 64 479–498.
  • [29] Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035.
  • [30] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. Roy. Statist. Soc. Ser. B 66 187–205.
  • [31] Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Nat. Acad. Sci. USA 100 9440–9445.
  • [32] van der Laan, M. J., Dudoit, S. and Pollard, K. S. (2004a). Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Statistical Applications in Genetics and Molecular Biology 3 Article 15.
  • [33] van der Laan, M. J., Dudoit, S. and Pollard, K. S. (2004b). Multiple testing. Part II. Step-down procedures for control of the family-wise error rate. Statistical Applications in Genetics and Molecular Biology 3 Article 14.
  • [34] Wacholder, S., Rothman, N. and Caporaso, N. (2000). Population stratification in epidemiologic studies of common genetic variants and cancer: Quantification of bias. J. National Cancer Institute 92 1151–1158.
  • [35] Varambally, S. et al. (2002). The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature 419 624–629.
  • [36] Wright, S. (1951). The genetical structure of populations. Ann. Eugenics 15 323–354.