## The Annals of Applied Statistics

### Hypothesis setting and order statistic for robust genomic meta-analysis

#### Abstract

Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper we propose an order statistic of $p$-values ($r$th ordered $p$-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) “in all studies,” “in the majority of studies” or “in one or more studies,” and specify rOP as a suitable method for detecting DE genes “in the majority of studies.” We develop methods to estimate the parameter $r$ in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher’s method, Stouffer’s method, minimum $p$-value method and maximum $p$-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

#### Article information

Source
Ann. Appl. Stat., Volume 8, Number 2 (2014), 777-800.

Dates
First available in Project Euclid: 1 July 2014

https://projecteuclid.org/euclid.aoas/1404229514

Digital Object Identifier
doi:10.1214/13-AOAS683

Mathematical Reviews number (MathSciNet)
MR3262534

Zentralblatt MATH identifier
06333776

#### Citation

Song, Chi; Tseng, George C. Hypothesis setting and order statistic for robust genomic meta-analysis. Ann. Appl. Stat. 8 (2014), no. 2, 777--800. doi:10.1214/13-AOAS683. https://projecteuclid.org/euclid.aoas/1404229514

#### References

• Begum, F., Ghosh, D., Tseng, G. C. and Feingold, E. (2012). Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res. 40 3777–3784.
• Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57 289–300.
• Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
• Berger, R. L. (1982). Multiparameter hypothesis testing and acceptance sampling. Technometrics 24 295–300.
• Berger, R. L. and Hsu, J. C. (1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statist. Sci. 11 283–319.
• Birnbaum, A. (1954). Combining independent tests of significance. J. Amer. Statist. Assoc. 49 559–574.
• Cooper, H. M., Hedges, L. V. and Valentine, J. C. (2009). The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation, Thousand Oaks, CA.
• Erickson, S., Kim, K. and Allison, D. B. (2009). Meta-Analysis and Combining Information in Genetics and Genomics. Chapman & Hall/CRC, London.
• Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh.
• Hedges, L. V. and Olkin, I. (1980). Vote-counting methods in research synthesis. Psychol. Bull. 88 359–369.
• Kang, D. D., Sibille, E., Kaminski, N. and Tseng, G. C. (2012). MetaQC: Objective quality control and inclusion/exclusion criteria for genomic meta-analysis. Nucleic Acids Res. 40 e15.
• Li, J. and Tseng, G. C. (2011). An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Ann. Appl. Stat. 5 994–1019.
• Littell, R. C. and Folks, J. L. (1971). Asymptotic optimality of Fisher’s method of combining independent tests. J. Amer. Statist. Assoc. 66 802–806.
• Littell, R. C. and Folks, J. L. (1973). Asymptotic optimality of Fisher’s method of combining independent tests. II. J. Amer. Statist. Assoc. 68 193–194.
• Owen, A. B. (2009). Karl Pearson’s meta-analysis revisited. Ann. Statist. 37 3867–3892.
• Park, P. J., Kong, S. W., Tebaldi, T., Lai, W. R., Kasif, S. and Kohane, I. S. (2009). Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance. Bioinformatics 25 3121–3127.
• Pearson, K. (1934). On a new method of determining “goodness of fit.” Biometrika 26 425–442.
• Qiu, X., Yakovlev, A. et al. (2006). Some comments on instability of false discovery rate estimation. J. Bioinform. Comput. Biol. 4 1057–1068.
• Rhodes, D. R., Barrette, T. R., Rubin, M. A., Ghosh, D. and Chinnaiyan, A. M. (2002). Meta-analysis of microarrays: Interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 62 4427–4433.
• Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 24 220–238.
• Shen, K. and Tseng, G. C. (2010). Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics 26 1316–1323.
• Song, C. and Tseng, G. C. (2014a). Supplement to “Hypothesis setting and order statistic for robust genomic meta-analysis.” DOI:10.1214/13-AOAS683SUPPA.
• Song, C. and Tseng, G. C. (2014b). Supplement to “Hypothesis setting and order statistic for robust genomic meta-analysis.” DOI:10.1214/13-AOAS683SUPPB.
• Song, C. and Tseng, G. C. (2014c). Supplement to “Hypothesis setting and order statistic for robust genomic meta-analysis.” DOI:10.1214/13-AOAS683SUPPC.
• Song, C. and Tseng, G. C. (2014d). Supplement to “Hypothesis setting and order statistic for robust genomic meta-analysis.” DOI:10.1214/13-AOAS683SUPPD.
• Stouffer, S. A., Suchman, E. A., Devinney, L. C., Star, S. A. and Williams Jr., R. M. (1949). The American Soldier: Adjustment During Army Life. Princeton Univ. Press, Princeton, NJ.
• Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.
• Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 411–423.
• Tippett, L. H. C. (1931). The Methods of Statistics. Williams Norgate, London.
• Tseng, G. C., Ghosh, D. and Feingold, E. (2012). Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 40 3785–3799.
• Wang, X., Lin, Y., Song, C., Sibille, E. and Tseng, G. C. (2012a). Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: With application to major depressive disorder. BMC Bioinformatics 13 52.
• Wang, X., Kang, D. D., Shen, K., Song, C., Lu, S., Chang, L. C., Liao, S. G., Huo, Z., Tang, S., Kaminski, N. et al. (2012b). An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection. Bioinformatics 28 2534–2536.
• Wilkinson, B. (1951). A statistical consideration in psychological research. Psychol. Bull. 48 156–158.

#### Supplemental materials

• Supplementary material A: Supplement Text. Details of one-sided test modification to avoid discordant effect sizes.
• Supplementary material B: Supplement Theorems 1 and 2. Theorem 1—Asymptotic property of vote counting as $K\rightarrow\infty$. Theorem 2—Asymptotic property of rOP as $K\rightarrow\infty$.
• Supplementary material C: Supplement Tables 1 and 2. Table 1—Detail information of combined data sets. Table 2—FDRs for simulation analysis without correlated genes.
• Supplementary material D: Supplement Figures 1 to 7. Figure 1—Results of brain cancer data set using one-sided corrected rOP. Figure 2—Results of MDD data set. Figure 3—Results of diabetes data set. Figure 4—Permutation results of diabetes data set. Figure 5—Results of brain cancer and 1 random MDD data set. Figure 6—Simulation results without correlated genes. Figure 7—Mean rank of different methods for the top $U$ pathways.