## Institute of Mathematical Statistics Lecture Notes - Monograph Series

- Optimality
- 2009, 304-325

### Parametric Mixture Models for Estimating the Proportion of True Null Hypotheses and Adaptive Control of FDR

Ajit C. Tamhane and Jiaxiao Shi

#### Abstract

Estimation of the proportion or the number of true null hypotheses is an important problem in multiple testing, especially when the number of hypotheses is large. Wu, Guan and Zhao [*Biometrics* **62** (2006) 735–744] found that nonparametric approaches are too conservative. We study two parametric mixture models (normal and beta) for the distributions of the test statistics or their *p*-values to address this problem. The components of the mixture are the null and alternative distributions with mixing proportions *π*_{0} and 1−*π*_{0}, respectively, where *π*_{0} is the unknown proportion to be estimated. The normal model assumes that the test statistics from the true null hypotheses are i.i.d. *N*(0, 1) while those from the alternative hypotheses are i.i.d. *N*(*δ*, 1) with *δ*≠0. The beta model assumes that the *p*-values from the null hypotheses are i.i.d. *U*[0, 1] and those from the alternative hypotheses are i.i.d. Beta(*a*, *b*) with *a*<1<*b*. All parameters are assumed to be unknown. Three methods of estimation of *π*_{0} are developed for each model. The methods are compared via simulation with each other and with Storey’s [*J. Roy. Statist. Soc. Ser. B* **64** (2002) 297–304] nonparametric method in terms of the bias and mean square error of the estimators of *π*_{0} and the achieved FDR. Robustness of the estimators to the model violations is also studied by generating data from other models. For the normal model, the parametric methods perform better compared to Storey’s method with the EM method (Dempster, Laird and Rubin [*Roy. Statist. Soc. Ser. B* **39** (1977) 1–38]) performing best overall when the assumed model holds; however, it is not very robust to significant model violations. For the beta model, the parametric methods do not perform as well because of the difficulties of estimation of parameters, and Storey’s nonparametric method turns out to be the winner in many cases. Therefore the beta model is not recommended for use in practice. An example is given to illustrate the methods.

#### Chapter information

**Source***Optimality: The Third Erich L. Lehmann Symposium* (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2009)

**Dates**

First available in Project Euclid: 3 August 2009

**Permanent link to this document**

https://projecteuclid.org/euclid.lnms/1249305336

**Digital Object Identifier**

doi:10.1214/09-LNMS5718

**Zentralblatt MATH identifier**

1271.62034

**Subjects**

Primary: 62F10: Point estimation

Secondary: 62F12: Asymptotic properties of estimators

**Keywords**

Beta model bias-correction EM method mixture model false discovery rate (FDR) least squares method maximum likelihood method normal model p-values

**Rights**

Copyright © 2009, Institute of Mathematical Statistics

#### Citation

Rojo, Javier. Parametric Mixture Models for Estimating the Proportion of True Null Hypotheses and Adaptive Control of FDR. Optimality, 304--325, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2009. doi:10.1214/09-LNMS5718. https://projecteuclid.org/euclid.lnms/1249305336

#### References

- [1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing.
*J. Roy. Statist. Soc. Ser. B***57**289–300. - [2] Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics.
*J. Educational Statist.***25**60–83. - [3] Black, M. A. (2004). A note on the adaptive control of false discovery rates.
*J. Roy. Statist. Soc. Ser. B***66**297–304. - [4] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm.
*J. Roy. Statist. Soc. Ser. B***39**1–38. - [5] Finner, H. and Roters, M. (2001). On the false discovery rate and expected type I error.
*Biom. J.***8**985–1005. - [6] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure.
*J. Roy. Statist. Soc. Ser. B***64**499–517. - [7] Gill, P. E., Murray, W. and Wright, M. H. (1981).
*Practical Optimization*. Academic Press, London and New York. - [8] Guan, Z., Wu, B. and Zhao, H. (2004). Model-based approach to FDR estimation. Technical Report 2004-016, Division of Biostatistics, Univ. of Minnesota, Minneapolis, MN.
- [9] Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance.
*Biometrika***75**800–803. - [10] Hochberg, Y. and Benjamini, Y. (1990). More powerful procedures for multiple significance testing.
*Statist. Med.***9**811–818. - [11] Hochberg, Y. and Tamhane, A. C. (1987).
*Multiple Comparison Procedures*. Wiley, New York. - [12] Hsueh, H., Chen, J. J. and Kodell, R. L. (2003). Comparison of methods for estimating the number of true null hypotheses in multiple testing.
*Journal of Biopharmaceutical Statistics***13**675–689. - [13] Hung, H. M., O’Neill, R. T., Bauer, P. and Kohne, K. (1997). The behavior of the p-value when the alternative hypothesis is true.
*Biometrics***53**11–22. - [14] Iyer, V. and Sarkar, S. (2007). An adaptive single-step FDR procedure with applications to DNA microarray analysis.
*Biom. J.***49**127–135. - [15] Jiang, H. and Doerge, R. W. (2005). Estimating the proportion of the true null hypotheses for multiple comparisons. Preprint.
- [16] Johnson, N. L. and Kotz, S. (1970).
*Continuous Univariate Distributions***I**. Wiley, New York. - [17] Langaas, M., Lindquist, B. H. and Ferkingstad, E. (2004). Estimating the proportion of true null hypotheses, with application to DNA microarray data.
*J. Roy. Statist. Soc. Ser. B***67**555–572. - [18] Schweder, T. and Spjøtvoll, E. (1982). Plots of
*p*-values to evaluate many tests simultaneously.*Biometrika***69**493–502. - [19] Shaffer, J. P. (2005). Multiple requirements for multiple test procedures. Paper presented at the IVth International Conference on Multiple Comparison Procedures, Shanghai, China.
- [20] Shi, J. (2006). Improved Estimation of the Proportion of True Null Hypotheses with Applications to Adaptive Control of FDR and Drug Screening. Doctoral dissertation, Department of Statistics, Northwestern University, Evanston, IL.
- [21] Storey, J. (2002). A direct approach to false discovery rates.
*J. Roy. Statist. Soc. Ser. B***64**297–304. - [22] Storey, J., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach.
*J. Roy. Statist. Soc. Ser. B***66**187–205. - [23] Turkheimer, F. E., Smith, C. B. and Schmidt, K. (2001). Estimation of the number of ‘true’ null hypotheses in multivariate analysis of neuroimaging data.
*NeuroImage***13**920–930. - [24] Wu, B., Guan, Z. and Zhao, H. (2006). Parametric and nonparametric FDR estimation revisited.
*Biometrics***62**735–744.