Electronic Journal of Statistics

New FDR bounds for discrete and heterogeneous tests

Sebastian Döhler, Guillermo Durand, and Etienne Roquain

Full-text: Open access

Abstract

To find interesting items in genome-wide association studies or next generation sequencing data, a crucial point is to design powerful false discovery rate (FDR) controlling procedures that suitably combine discrete tests (typically binomial or Fisher tests). In particular, recent research has been striving for appropriate modifications of the classical Benjamini-Hochberg (BH) step-up procedure that accommodate discreteness and heterogeneity of the data. However, despite an important number of attempts, these procedures did not come with theoretical guarantees. In this paper, we provide new FDR bounds that allow us to fill this gap. More specifically, these bounds make it possible to construct BH-type procedures that incorporate the discrete and heterogeneous structure of the data and provably control the FDR for any fixed number of null hypotheses (under independence). Markedly, our FDR controlling methodology also allows to incorporate the quantity of signal in the data (corresponding therefore to a so-called $\pi_{0}$-adaptive procedure) and to recover some prominent results of the literature. The power advantage of the new methods is demonstrated in a numerical experiment and for some appropriate real data sets.

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 1867-1900.

Dates
Received: November 2017
First available in Project Euclid: 13 June 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1528855551

Digital Object Identifier
doi:10.1214/18-EJS1441

Mathematical Reviews number (MathSciNet)
MR3813600

Zentralblatt MATH identifier
06890101

Subjects
Primary: 62H15: Hypothesis testing
Secondary: 62Q05: Statistical tables

Keywords
False discovery rate heterogeneous data discrete hypothesis testing type I error rate control adaptive procedure step-up algorithm step-down algorithm

Rights
Creative Commons Attribution 4.0 International License.

Citation

Döhler, Sebastian; Durand, Guillermo; Roquain, Etienne. New FDR bounds for discrete and heterogeneous tests. Electron. J. Statist. 12 (2018), no. 1, 1867--1900. doi:10.1214/18-EJS1441. https://projecteuclid.org/euclid.ejs/1528855551


Export citation

References

  • Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing., Journal of the Royal Statistical Society. Series B 57(1), 289–300.
  • Benjamini, Y., A. M. Krieger, and D. Yekutieli (2006). Adaptive linear step-up procedures that control the false discovery rate., Biometrika 93(3), 491–507.
  • Benjamini, Y. and W. Liu (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence., J. Statist. Plann. Inference 82(1-2), 163–170.
  • Berger, R. L. (1996). More powerful tests from confidence interval p values., The American Statistician 50(4), 314–318.
  • Blanchard, G., T. Dickhaus, E. Roquain, and F. Villers (2014). On least favorable configurations for step-up-down tests., Statist. Sinica 24(1), 1–23.
  • Blanchard, G. and E. Roquain (2009). Adaptive false discovery rate control under independence and dependence., J. Mach. Learn. Res. 10, 2837–2871.
  • Chen, X. and R. Doerge (2015a)., fdrDiscreteNull: False Discovery Rate Procedure Under Discrete Null Distributions. R package version 1.0.
  • Chen, X. and R. Doerge (2015b). A weighted fdr procedure under discrete and heterogeneous null distributions., arXiv:1502.00973.
  • Dickhaus, T. (2014)., Simultaneous statistical inference. Springer, Heidelberg. With applications in the life sciences.
  • Dickhaus, T., K. Straßburger, D. Schunk, C. Morcillo-Suarez, T. Illig, and A. Navarro (2012). How to analyze many contingency tables simultaneously in genetic association studies., Statistical applications in genetics and molecular biology 11(4).
  • Döhler, S. (2016). A discrete modification of the Benjamini—Yekutieli procedure., Econometrics and Statistics.
  • Durand, G. (2017). Adaptive p-value weighting with power optimality., arXiv:1710.01094.
  • Ferreira, J. A. (2007). The Benjamini-Hochberg method in the case of discrete test statistics., Int. J. Biostat. 3, Art. 11, 18.
  • Ferreira, J. A. and A. H. Zwinderman (2006). On the Benjamini-Hochberg method., Ann. Statist. 34(4), 1827–1849.
  • Finner, H., T. Dickhaus, and M. Roters (2009). On the false discovery rate and an asymptotically optimal rejection curve., Ann. Statist. 37(2), 596–618.
  • Gavrilov, Y., Y. Benjamini, and S. K. Sarkar (2009). An adaptive step-down procedure with proven FDR control under independence., Ann. Statist. 37(2), 619–629.
  • Gilbert, P. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics., Journal of the Royal Statistical Society. Series C 54(1), 143–158.
  • Habiger, J. D. (2015). Multiple test functions and adjusted $p$-values for test statistics with discrete distributions., J. Statist. Plann. Inference 167, 1–13.
  • Heesen, P. and A. Janssen (2016). Dynamic adaptive multiple tests with finite sample fdr control., Journal of Statistical Planning and Inference 168, 38 – 51.
  • Heller, R. and H. Gur (2011). False discovery rate controlling procedures for discrete tests., arXiv:1112.4627.
  • Heller, R., H. Gur, and S. Yaacoby (2012)., discreteMTP: Multiple testing procedures for discrete test statistics. R package version 0.1-2.
  • Heyse, J. F. (2011). A false discovery rate procedure for categorical data. In, Recent Advances in Bio- statistics: False Discovery Rates, Survival Analysis, and Related Topics, pp. 43–58.
  • Ignatiadis, N., B. Klaus, J. B. Zaugg, and W. Huber (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing., Nature methods 13(7), 577.
  • Karp, N. A., R. Heller, S. Yaacoby, J. K. White, and Y. Benjamini (2016). Improving the identification of phenotypic abnormalities and sexual dimorphism in mice when studying rare event categorical characteristics., Genetics.
  • Lancaster, H. O. (1961). Significance tests in discrete distributions., Journal of the American Statistical Association 56(294), 223–234.
  • Liang, K. and D. Nettleton (2012). Adaptive and dynamic adaptive procedures for false discovery rate control and estimation., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74(1), 163–182.
  • Lister, R., R. C. O’Malley, J. Tonti-Filippini, B. D. Gregory, C. C. Berry, A. H. Millar, and J. R. Ecker (2008, May). Highly integrated single-base resolution maps of the epigenome in Arabidopsis., Cell 133(3), 523–536.
  • Mantel, N. (1980). A biometrics invited paper. assessing laboratory evidence for neoplastic activity., Biometrics 36(3), 381–399.
  • Pounds, S. and C. Cheng (2006). Robust estimation of the false discovery rate., Bioinformatics 22(16), 1979–1987.
  • R Core Team (2016)., R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
  • Ripamonti, E., C. Lloyd, and P. Quatto (2017). Contemporary frequentist views of the 2x2 binomial trial., Statistical Science.
  • Roquain, E. and M. van de Wiel (2009). Optimal weighting for false discovery rate control., Electron. J. Stat. 3, 678–711.
  • Roquain, E. and F. Villers (2011). Exact calculations for false discovery proportion with application to least favorable configurations., Ann. Statist. 39(1), 584–612.
  • Storey, J. D., J. E. Taylor, and D. Siegmund (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(1), 187–205.
  • Tarone, R. E. (1990). A modified bonferroni method for discrete data., Biometrics 46(2), 515–522.
  • van den Broek, E., M. J. J. Dijkstra, O. Krijgsman, D. Sie, J. C. Haan, J. J. H. Traets, M. A. van de Wiel, I. D. Nagtegaal, C. J. A. Punt, B. Carvalho, B. Ylstra, S. Abeln, G. A. Meijer, and R. J. A. Fijneman (2015, 09). High prevalence and clinical relevance of genes affected by chromosomal breaks in colorectal cancer., PLOS ONE 10(9), 1–14.
  • Westfall, P. and R. Wolfinger (1997). Multiple tests with discrete distributions., The American Statistician 51(1), 3–8.