The Annals of Applied Statistics

Bayesian nonparametric cross-study validation of prediction methods

Lorenzo Trippa, Levi Waldron, Curtis Huttenhower, and Giovanni Parmigiani

Full-text: Open access


We consider comparisons of statistical learning algorithms using multiple data sets, via leave-one-in cross-study validation: each of the algorithms is trained on one data set; the resulting model is then validated on each remaining data set. This poses two statistical challenges that need to be addressed simultaneously. The first is the assessment of study heterogeneity, with the aim of identifying a subset of studies within which algorithm comparisons can be reliably carried out. The second is the comparison of algorithms using the ensemble of data sets. We address both problems by integrating clustering and model comparison. We formulate a Bayesian model for the array of cross-study validation statistics, which defines clusters of studies with similar properties and provides the basis for meaningful algorithm comparison in the presence of study heterogeneity. We illustrate our approach through simulations involving studies with varying severity of systematic errors, and in the context of medical prognosis for patients diagnosed with cancer, using high-throughput measurements of the transcriptional activity of the tumor’s genes.

Article information

Ann. Appl. Stat., Volume 9, Number 1 (2015), 402-428.

First available in Project Euclid: 28 April 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Reproducibility validation analysis meta-analysis random partitions Bayesian nonparametrics cancer signatures


Trippa, Lorenzo; Waldron, Levi; Huttenhower, Curtis; Parmigiani, Giovanni. Bayesian nonparametric cross-study validation of prediction methods. Ann. Appl. Stat. 9 (2015), no. 1, 402--428. doi:10.1214/14-AOAS798.

Export citation


  • Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. J. Multivariate Anal. 11 581–598.
  • Baggerly, K. A., Coombes, K. R. and Neeley, E. S. (2008). Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. J. Clin. Oncol. 26 1186–1187.
  • Bentink, S., Benjamin, H., Risch, T., Fan, J., Hirsch, M., Holton, K., Rubio, R., April, C., Chen, J., Eliza, W., Liu, J., Culhane, A., Drapkin, R., Quackenbush, J. and Matulonis, U. (2012). Angiogenic mRNA and microRNA gene expression signature predicts a novel subtype of serous ovarian cancer. PloS ONE 7 e30269.
  • Bernau, C., Riester, M., Boulesteix, A.-L., Parmigiani, G., Huttenhower, C., Waldron, L. and Trippa, L. (2014). Cross-study validation for the assessment of prediction algorithms. Bioinformatics 30 i105–i112.
  • Berry, D. A. (1990). A Bayesian approach to multicenter trials and metaanalysis. ERIC, E0325480.
  • Berry, D. A. and Christensen, R. (1979). Empirical Bayes estimation of a binomial parameter via mixtures of Dirichlet processes. Ann. Statist. 7 558–568.
  • Bonome, T., Levine, D., Shih, J., Randonovich, M., Cindy, P., Bogomolniy, F., Ozbun, L., Brady, J., Barrett, J., Boyd, J. and Birrer, M. (2008). A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. Cancer Res. 68 5478–5486.
  • Burr, D. and Doss, H. (2005). A Bayesian semiparametric model for random-effects meta-analysis. J. Amer. Statist. Assoc. 100 242–251.
  • The Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474 609–615.
  • Charon, I., Denœud, L., Guénoche, A. and Hudry, O. (2006). Maximum transfer distance between partitions. J. Classification 23 103–121.
  • Crijns, A. P. G., Fehrmann, R. S. N., de Jong, S., Gerbens, F., Meersma, G. J., Klip, H. G., Hollema, H., Hofstra, R. M. W., te Meerman, G. J., de Vries, E. G. E. and van der Zee, A. G. J. (2009). Survival-related profile, pathways, and transcription factors in ovarian cancer. PLoS Med. 6 e24.
  • Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 1–30.
  • Denœud, L. and Guénoche, A. (2006). Comparison of distance indices between partitions. In Data Science and Classification 21–28. Springer, Berlin.
  • Dersimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials. Control. Clin. Trials 7 177–188.
  • Dressman, H., Berchuck, A., Chan, G., Zhai, J., Bild, A., Sayer, R., Cragun, J., Clarke, J., Whitaker, R., Li, L., Gray, J., Marks, J., Ginsburg, G., Potti, A., West, M., Nevins, J. and Lancaster, J. (2007). An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J. Clin. Oncol.: Official Journal of the American Society of Clinical Oncology 25 517–525.
  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
  • Ferté, C., Trister, A. D., Huang, E., Bot, B. M., Guinney, J., Commo, F., Sieberts, S., André, F., Besse, B., Soria, J.-C. and Friend, S. H. (2013). Impact of bioinformatic procedures in the development and translation of high-throughput molecular classifiers in oncology. Clin. Cancer Res. 19 4315–4325.
  • Ganzfried, B. F., Riester, M., Haibe-Kains, B., Risch, T., Tyekucheva, S., Jazic, I., Wang, X. V., Ahmadifar, M., Birrer, M. J., Parmigiani, G., Huttenhower, C. and Waldron, L. (2013). curatedOvarianData: Clinically annotated data for the ovarian cancer transcriptome. Database 2013 bat013.
  • Garrett-Mayer, E., Parmigiani, G., Zhong, X., Cope, L. and Gabrielson, E. (2008). Cross-study validation and combined analysis of gene expression microarray data. Biostatistics (Oxford, England) 9 333–354.
  • Hammerman, P. S., Lawrence, M. S., Voet, D., Jing, R., Cibulskis, K., Sivachenko, A., Stojanov, P., McKenna, A., Lander, E. S. Gabriel, S. et alet al. (2012). Comprehensive genomic characterization of squamous cell lung cancers. Nature 489 519–525.
  • Hoover, D. N. (1982). Row-column exchangeability and a generalized model for probability. In Exchangeability in Probability and Statistics (Rome, 1981) 281–291. North-Holland, Amsterdam.
  • Hou, J., Aerts, J., den Hamer, B., van Ijcken, W., den Bakker, M., Riegman, P., van der Leest, C., van der Spek, P., Foekens, J. A., Hoogsteden, H. C., Grosveld, F. and Philipsen, S. (2010). Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS ONE 5 e10312.
  • Ishwaran, H., Kogalur, U. B., Blackstone, E. H. and Lauer, M. S. (2008). Random survival forests. Ann. Appl. Stat. 2 841–860.
  • Japkowicz, N. and Shah, M. (2011). Evaluating Learning Algorithms: A Classification Perspective. Cambridge Univ. Press, Cambridge.
  • Kang, J., D’Andrea, A. D. and Kozono, D. (2012). A DNA repair pathway-focused score for prediction of outcomes in ovarian cancer treated with platinum-based chemotherapy. J. Natl. Cancer Inst. 104 670–681.
  • Konstantinopoulos, P., Spentzos, D., Karlan, B., Taniguchi, T., Fountzilas, E., Francoeur, N., Levine, D. and Cannistra, S. (2010). Gene expression profile of BRCAness that correlates with responsiveness to chemotherapy and with outcome in patients with epithelial ovarian cancer. J. Clin. Oncol.: Official Journal of the American Society of Clinical Oncology 28 3555–3561.
  • Lee, J., Quintana, F. A., Müller, P. and Trippa, L. (2013). Defining predictive probability functions for species sampling models. Statist. Sci. 28 209–222.
  • Li, H. and Luan, Y. (2003). Kernel Cox regression models for linking gene expression profiles to censored survival data. Pac. Symp. Biocomput. 65–76.
  • Lindley, D. V. and Smith, A. F. M. (1972). Bayes estimates for the linear model. J. Roy. Statist. Soc. Ser. B. Stat. Methodol. 34 1–41.
  • Mok, S., Bonome, T., Vathipadiekal, V., Bell, A., Johnson, M., Wong, K.-k., Park, D., Hao, K., Yip, D., Donninger, H., Ozbun, L., Samimi, G., Brady, J., Randonovich, M., Cindy, P., Barrett, J., Wong, W., Welch, W., Berkowitz, R. and Birrer, M. (2009). A gene signature predictive for outcome in advanced ovarian cancer identifies a survival factor: Microfibril-associated glycoprotein 2. Cancer Cell 16 521–532.
  • Morris, C. N. and Normand, S. L. (1992). Hierarchical models for combining information and for meta-analyses. Bayesian Stat. 4 321–344.
  • Quintana, F. A. and Iglesias, P. L. (2003). Bayesian clustering and product partition models. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 557–574.
  • Riester, M., Wei, W., Waldron, L., Culhane, A. C., Trippa, L., Oliva, E., Kim, S.-h., Michor, F., Huttenhower, C., Parmigiani, G. et alet al. (2014). Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J. Natl. Cancer Inst. dju048.
  • Roy, D. and Teh, Y. (2009). The mondrian process. Adv. Neural Inf. Process. Syst. 21 27.
  • Rubin, D. B. (1981). The Bayesian bootstrap. Ann. Statist. 9 130–134.
  • Shedden, K., Taylor, J. M. G., Enkemann, S. A., Tsao, M.-S., Yeatman, T. J., Gerald, W. L., Eschrich, S., Jurisica, I., Giordano, T. J., Misek, D. E. et alet al. (2008). Gene expression-based survival prediction in lung adenocarcinoma: A multi-site, blinded validation study. Nat. Med. 14 822–827.
  • Sinha, D., Ibrahim, J. G. and Chen, M.-H. (2003). A Bayesian justification of Cox’s partial likelihood. Biometrika 90 629–641.
  • Swisher, E., Taniguchi, T. and Karlan, B. (2012). Molecular scores to predict ovarian cancer outcomes: A worthy goal, but not ready for prime time. J. Natl. Cancer Inst. 104 642–645.
  • Tothill, R., Tinker, A., George, J., Brown, R., Fox, S., Lade, S., Johnson, D., Trivett, M., Etemadmoghadam, D., Locandro, B., Traficante, N., Fereday, S., Hung, J., Chiew, Y., Haviv, I., Group, A. OC. S., Gertig, D., Anna, D. and Bowtell, D. (2008). Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin. Cancer Res. 14 5198–5208.
  • Trippa, L., Waldron, L., Huttenhower, C. and Parmigiani, G. (2015). Supplement to “Bayesian nonparametric cross-study validation of prediction methods.” DOI:10.1214/14-AOAS798SUPP.
  • Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. and Wei, L. J. (2011). On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30 1105–1117.
  • Waldron, L., Pintilie, M., Tsao, M.-S., Shepherd, F. A., Huttenhower, C. and Jurisica, I. (2011). Optimized application of penalized regression methods to diverse genomic data. Bioinformatics 27 3399–3406.
  • Waldron, L., Haibe-Kains, B., Culhane, A. C., Riester, M., Ding, J., Wang, X. V., Ahmadifar, M., Tyekucheva, S., Bernau, C., Risch, T., Ganzfried, B. F., Huttenhower, C., Birrer, M. and Parmigiani, G. (2014). Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J. Natl. Cancer Inst. 106 dju049.
  • Warn, D. E., Thompson, S. G. and Spiegelhalter, D. J. (2002). Bayesian random effects meta-analysis of trials with binary outcomes: Methods for the absolute risk difference and relative risk scales. Stat. Med. 21 1601–1623.
  • Yoshihara, K., Tajima, A., Yahata, T., Kodama, S., Fujiwara, H., Suzuki, M., Onishi, Y., Hatae, M., Sueyoshi, K., Fujiwara, H., Kudo, Y., Kotera, K., Masuzaki, H., Tashiro, H., Katabuchi, H., Inoue, I. and Tanaka, K. (2010). Gene expression profile for predicting survival in advanced-stage serous ovarian cancer across two independent datasets. PloS ONE 5 e9615.
  • Zhu, C.-Q., Ding, K., Strumpf, D., Weir, B. A., Meyerson, M., Pennell, N., Thomas, R. K., Naoki, K., Ladd-Acosta, C., Liu, N., Pintilie, M., Der, S., Seymour, L., Jurisica, I., Shepherd, F. A. and Tsao, M.-S. (2010). Prognostic and predictive gene signature for adjuvant chemotherapy in resected non-small-cell lung cancer. J. Clin. Oncol. 28 4417–4424.

Supplemental materials

  • Supplement to "Bayesian nonparametric cross-study validation of prediction methods".: We discuss results for logistic regression, Poisson regression, proportional hazards models and support vector machine procedures in the supplementary material.