The Annals of Applied Statistics

Measuring reproducibility of high-throughput experiments

Qunhua Li, James B. Brown, Haiyan Huang, and Peter J. Bickel

Full-text: Open access


Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the “irreproducible discovery rate” (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates.

Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.

Article information

Ann. Appl. Stat., Volume 5, Number 3 (2011), 1752-1779.

First available in Project Euclid: 13 October 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Reproducibility association mixture model copula iterative algorithm irreproducible discovery rate high-throughput experiment genomics


Li, Qunhua; Brown, James B.; Huang, Haiyan; Bickel, Peter J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5 (2011), no. 3, 1752--1779. doi:10.1214/11-AOAS466.

Export citation


  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Blest, D. C. (2000). Rank correlation—an alternative measure. Aust. N. Z. J. Stat. 42 101–111.
  • Boulesteix, A. L. and Slawski, M. (2009). Stability and aggregation of ranked gene lists. Briefings in Bioinformatics 10 556–568.
  • Boyle, A. P., Guinney, J., Crawford, G. E. and Furey, T. S. (2008). F-Seq: A feature density estimator for high-throughput sequence tags. Bioinformatics 24 2537–2538.
  • da Costa, J. P. and Soares, C. (2005). A weighted rank measure of correlation. Aust. N. Z. J. Stat. 47 515–529.
  • Deheuvels, P. (1979). La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d’indépendance. Acad. Roy. Belg. Bull. Cl. Sci. (5) 65 274–292.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Efron, B. (2004). Local false discovery rate. Technical report, Dept. Statistics, Stanford Univ.
  • ENCODE Project Consortium (2004). The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306 636–640.
  • Fisher, R. A. (1925). Statistical Methods for Research Workers, 1st ed. Oliver & Boyd, Edinburgh.
  • Fisher, N. I. and Switzer, P. (1985). Chi-plots for assessing dependence. Biometrika 72 253–265.
  • Fisher, N. I. and Switzer, P. (2001). Graphical assessment of dependence: Is a picture worth 100 tests? Amer. Statist. 55 233–239.
  • Genest, C. and Boies, J.-C. (2003). Detecting dependence with Kendall plots. Amer. Statist. 57 275–284.
  • Genest, C., Ghoudi, K. and Rivest, L. P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82 543–552.
  • Genest, C. and Plante, J.-F. (2003). On Blest’s measure of rank correlation. Canad. J. Statist. 31 35–52.
  • Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499–517.
  • Hu, L. (2006). Dependence patterns across financial markets: A mixed copula approach. Applied Financial Economics 16 717–729.
  • Ji, H., Jiang, H., Ma, W., Johnson, D. S., Myers, R. M. and Wong, W. H. (2008). An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotechnology 26 1293–1300.
  • Joe, H. (1997). Multivariate Models and Dependence Concepts. Monogr. Statist. Appl. Probab. 73. Chapman & Hall, London.
  • Jothi, R., Cuddapah, S., Barski, A., Cui, K. and Zhao, K. (2008). Genome-wide identification of in vivo protein-DNA binding sites from ChIP-seq data. Nucleic Acids Res. 36 5221–5231.
  • Kallenberg, W. C. M. and Ledwina, T. (1999). Data-driven rank tests for independence. J. Amer. Statist. Assoc. 94 285–301.
  • Kharchenko, P. V., Tolstorukov, M. Y. and Park, P. J. (2008). Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nature Biotechnology 26 1351–1359.
  • Kheradpour, P., Stark, A., Roy, S. and Kellis, M. (2007). Reliable prediction of regulator targets using 12 drosophila genomes. Genome Res. 17 1919–1931.
  • Kuo, W., Liu, F., Trimarchi, J., Punzo, C., Lombardi, M., Sarang, J., Whipple, M. E. et al. (2006). A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nature Biotechnology 24 832–840.
  • Lehmann, E. L. (2006). Nonparametrics: Statistical Methods Based on Ranks, 2nd ed. Springer, New York.
  • Li, Q., Brown, J. B., Huang, H. and Bickel, P. J. (2011). Supplement to “Measuring reproducibility of high-throughput experiments.” DOI:10.1214/11-AOAS466SUPP.
  • MAQC consortium (2006). The microarray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24 1151–1161.
  • McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics 36 318–324.
  • Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5 621–628.
  • Nelson, R. B. (1999). An Introduction to Copulas, 2nd ed. Springer, New York.
  • Oakes, D. (1994). Multivariate survival distributions. J. Nonparametr. Stat. 3 343–354.
  • Park, P. J. (2009). ChIP-seq: Advantages and challenges of a maturing technology. Nat. Rev. Genet. 10 669–680.
  • Rozowsky, J., Euskirchen, G., Auerbach, R. K., Zhang, Z. D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M. and Gerstein, M. B. (2009). PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotechnology 27 66–75.
  • Sklar, M. (1959). Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8 229–231.
  • Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035.
  • Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A. and Williams, J. (1949). The American Soldier: Vol. 1. Adjustment During Army Life. Princeton Univ. Press, Princeton, NJ.
  • Sun, W. and Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 901–912.
  • Thurman, R., Hawrylycz, M., Kuehn, S., Haugen, E. and Stamatoyannopoulos, S. (2011). Hotspot: A scan statistic for identifying enriched regions of short-read sequence tags. Unpublished manuscript, Univ. Washington.
  • Valouev, A., Johnson, D. S., Sundquist, A., Medina, C., Anton, E., Batzoglou, S., Myers, R. M. and Sidow, A. (2008). Genome-wide analysis of transcription factor binding sites based on ChIP-seq data. Nature Methods 5 829–834.
  • Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Nussbaum, C., Myers, R. M., Brown, M., Li, W. and Liu, X. S. (2008). Model-based analysis of ChIP-seq (MACS). Genome Biology 9 R137.

Supplemental materials

  • Supplementary material: Supplementary materials for Measuring reproducibility of high-throughput experiments. This supplement consists of four parts. Part 1 describes the algorithm for estimating parameters in our copula mixture model. Part 2 provides a theoretical justification for the efficiency of our estimator for the proposed copula mixture model when n is large. Part 3 derives the properties of the correspondence curves in Section 2.1.1. Part 4 provides an extension of our model to the case with multiple replicates.