Bayesian Analysis

Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling

Andrea Cremaschi, Raffaele Argiento, Katherine Shoemaker, Christine Peterson, and Marina Vannucci

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate t-distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet t-distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas.

Article information

Source
Bayesian Anal., Advance publication (2018), 31 pages.

Dates
First available in Project Euclid: 28 March 2019

Permanent link to this document
https://projecteuclid.org/euclid.ba/1553738429

Digital Object Identifier
doi:10.1214/19-BA1153

Keywords
graphical models Bayesian nonparametrics normalized completely random measures hierarchical models radiomics data t-distribution

Rights
Creative Commons Attribution 4.0 International License.

Citation

Cremaschi, Andrea; Argiento, Raffaele; Shoemaker, Katherine; Peterson, Christine; Vannucci, Marina. Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling. Bayesian Anal., advance publication, 28 March 2019. doi:10.1214/19-BA1153. https://projecteuclid.org/euclid.ba/1553738429


Export citation

References

  • Aerts, H. J., Velazquez, E. R., Leijenaar, R. T., Parmar, C., Grossmann, P., Cavalho, S., Bussink, J., Monshouwer, R., Haibe-Kains, B., Rietveld, D., et al. (2014). “Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach.” Nature Communications, 5.
  • Argiento, R., Bianchini, I., and Guglielmi, A. (2016). “A blocked Gibbs sampler for NGG-mixture models via a priori truncation.” Statistics and Computing, 26(3): 641–661.
  • Argiento, R., Cremaschi, A., and Vannucci, M. (2019). “Hierarchical Normalized Completely Random Measures to Cluster Grouped Data.” Journal of the American Statistical Association.
  • Argiento, R., Guglielmi, A., Hsiao, C. K., Ruggeri, F., and Wang, C. (2015). “Modeling the association between clusters of SNPs and disease responses.” In Nonparametric Bayesian Inference in Biostatistics, 115–134. Springer.
  • Argiento, R., Guglielmi, A., and Pievatolo, A. (2010). “Bayesian density estimation and model selection using nonparametric hierarchical mixtures.” Computational Statistics and data Analysis, 54: 816–832.
  • Atay-Kayis, A. and Massam, H. (2005). “A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models.” Biometrika, 92(2): 317–335.
  • Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J. S., Freymann, J. B., Farahani, K., and Davatzikos, C. (2017). “Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features.” Scientific Data, 4: 170117. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5685212/
  • Barbieri, M. M., Berger, J. O., et al. (2004). “Optimal predictive model selection.” The Annals of Statistics, 32(3): 870–897.
  • Barrios, E., Lijoi, A., Nieto-Barajas, L. E., and Prünster, I. (2013). “Modeling with normalized random measure mixture models.” Statistical Science, 28: 313–334.
  • Bhadra, A., Rao, A., and Baladandayuthapani, V. (2018). “Inferring network structure in non-normal and mixed discrete-continuous genomic data.” Biometrics, 74(1): 185–195.
  • Camerlenghi, F., Lijoi, A., Orbanz, P., and Prünster, I. (2019). “Distribution theory for hierarchical processes.” Annals of Statistics, 47(1): 67–92.
  • Camerlenghi, F., Lijoi, A., and Prünster, I. (2017). “Bayesian prediction with multiple-samples information.” Journal of Multivariate Analysis, 156: 18–28.
  • Camerlenghi, F., Lijoi, A., and Prünster, I. (2018). “Bayesian nonparametric inference beyond the Gibbs-type framework.” Scandinavian Journal of Statistics, 45(4): 1062–1091.
  • Caron, F. and Fox, E. B. (2017). “Sparse graphs using exchangeable random measures.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5): 1295–1366.
  • Cho, H. and Park, H. (2017). “Classification of low-grade and high-grade glioma using multi-modal image radiomics features.” In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 3081–3084.
  • Cremaschi, A., Argiento, R., Shoemaker, K., Peterson, C., Vannucci, M. (2019). “Supplementary Material for “Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling”.” Bayesian Analysis.
  • De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prünster, I., and Ruggiero, M. (2015). “Are Gibbs-type priors the most natural generalization of the Dirichlet process?” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212–229.
  • Dempster, A. (1972). “Covariance selection.” Biometrics, 28: 157–175.
  • Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G., and West, M. (2004). “Sparse graphical models for exploring gene expression data.” Journal of Multivariate Analysis, 90(1): 196–212.
  • Dobra, A., Lenkoski, A., and Rodriguez, A. (2011). “Bayesian Inference for General Gaussian Graphical Models With Application to Multivariate Lattice Data.” Journal of the American Statistical Association, 106(496): 1418–1433.
  • Favaro, S. and Teh, Y. (2013). “MCMC for Normalized Random Measure Mixture Models.” Statistical Science, 28(3): 335–359.
  • Finegold, M. and Drton, M. (2011). “Robust graphical modeling of gene networks using classical and alternative $t$-distributions.” The Annals of Applied Statistics, 1057–1080.
  • Finegold, M. and Drton, M. (2014). “Robust Bayesian Graphical Modeling Using Dirichlet $t$-Distributions.” Bayesian Analysis, 9(3): 521–550.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2008). “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics, 9(3): 432–441.
  • Friedman, N. (2004). “Inferring cellular networks using probabilistic graphical models.” Science, 303(5659): 799–805.
  • Gevaert, O., Mitchell, L. A., Achrol, A. S., Xu, J., Echegaray, S., Steinberg, G. K., Cheshier, S. H., Napel, S., Zaharchuk, G., and Plevritis, S. K. (2014). “Glioblastoma multiforme: exploratory radiogenomic analysis by using quantitative image features.” Radiology, 273(1): 168–174.
  • Geweke, J. et al. (1991). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments, volume 196. Federal Reserve Bank of Minneapolis, Research Department Minneapolis, MN, USA.
  • Gillies, R. J., Kinahan, P. E., and Hricak, H. (2016). “Radiomics: Images Are More than Pictures, They Are Data.” Radiology, 278(2): 563–577.
  • Giudici, P. and Green, P. J. (1999). “Decomposable graphical Gaussian model determination.” Biometrika, 86(4): 785–801.
  • Griffin, J. E. and Stephens, D. A. (2013). “Advances in Markov chain Monte Carlo.” Bayesian Theory and Applications, 104–144.
  • Hogea, C., Davatzikos, C., and Biros, G. (2008). “An image-driven parameter estimation problem for a reaction-diffusion glioma growth model with mass effects.” Journal of Mathematical Biology, 56(6): 793–825.
  • Ishwaran, H. and James, L. F. (2003). “Generalized weighted Chinese restaurant processes for species sampling mixture models.” Statistica Sinica, 1211–1235.
  • James, L., Lijoi, A., and Prünster, I. (2009). “Posterior analysis for normalized random measures with independent increments.” Scandinavian Journal of Statistics, 36: 76–97.
  • Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C., and West, M. (2005). “Experiments in stochastic computation for high-dimensional graphical models.” Statistical Science, 388–400.
  • Lambin, P., Rios Velazquez, E., Leijenaar, R., Carvalho, S., van Stiphout, R. G., et al. (2012). “Radiomics: Extracting more information from medical images using advanced feature analysis.” European Journal of Cancer, 48(4): 441–446.
  • Lauritzen, S. (1996). Graphical Models. Clarendon Press (Oxford and New York).
  • Lenkoski, A. (2013). “A direct sampler for G-Wishart variates.” Stat, 2: 119–128.
  • Lenkoski, A. and Dobra, A. (2011). “Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior.” Journal of Computational and Graphical Statistics, 20(1): 140–157.
  • Lijoi, A., Mena, R. H., and Prünster, I. (2007). “Controlling the reinforcement in Bayesian nonparametric mixture models.” Journal of the Royal Statistical Society B, 69: 715–740.
  • Lijoi, A. and Prünster, I. (2010). “Models beyond the Dirichlet process.” In Hjort, N., Holmes, C., Müller, P., and Walker (eds.), In Bayesian Nonparametrics, 80–136. Cambridge University Press.
  • Meinshausen, N. and Bühlmann, P. (2006). “High-dimensional graphs and variable selection with the lasso.” Annals of Statistics, 34(3): 1436–1462.
  • Mohammadi, A. and Wit, E. C. (2015). “Bayesian structure learning in sparse Gaussian graphical models.” Bayesian Analysis, 10(1): 109–138.
  • Morin, O., Vallières, M., Jochems, A., Woodruff, H. C., Valdes, G., Braunstein, S. E., Wildberger, J. E., Villanueva-Meyer, J. E., Kearney, V., Yom, S. S., Solberg, T. D., and Lambin, P. (2018). “A Deep Look into the Future of Quantitative Imaging in Oncology: A Statement of Working Principles and Proposal for Change.” International Journal of Radiation Oncology∗Biology∗Physics.
  • Mukherjee, S. and Speed, T. (2008). “Network inference using informative priors.” Proceedings of the National Academy of Sciences of the United States of America, 105(38): 14313–14318.
  • Neal, R. (2000). “Markov Chain sampling Methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265.
  • Parmar, P., C.and Grossmann, Bussink, J., Lambin, P., and Aerts, H. J. (2015). “Machine learning methods for quantitative radiomic biomarkers.” Scientific Reports, 5: 13087.
  • Peterson, C., Stingo, F., and Vannucci, M. (2016). “Joint Bayesian variable and graph selection for regression models with network-structured predictors.” Statistics in Medicine, 35(7): 1017–1031.
  • Peterson, C., Stingo, F. C., and Vannucci, M. (2015). “Bayesian Inference of Multiple Gaussian Graphical Models.” Journal of the American Statistical Association, 110(509): 159–174. PMID: 26078481.
  • Peterson, C., Vannucci, M., Karakas, C., Choi, W., Ma, L., and Maletić-Savatić, M. (2013). “Inferring metabolic networks using the Bayesian adaptive graphical lasso with informative priors.” Statistics and Its Interface, 6(4): 547–558.
  • Pitman, J. (1996). “Some developments of the Blackwell-MacQueen urn scheme.” Lecture Notes-Monograph Series, 245–267.
  • Pitman, J. (2003). “Poisson-Kingman Partitions.” In Science and Statistics: a Festschrift for Terry Speed, volume 40 of IMS Lecture Notes-Monograph Series, 1–34. Hayward (USA): Institute of Mathematical Statistics.
  • Pitt, M., Chan, D., and Kohn, R. (2006). “Efficient Bayesian inference for Gaussian copula regression models.” Biometrika, 93(3): 537–554.
  • Regazzini, E., Lijoi, A., and Prünster, I. (2003). “Distributional results for means of random measures with independent increments.” The Annals of Statistics, 31: 560–585.
  • Roverato, A. (2002). “Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models.” Scandinavian Journal of Statistics, 29(3): 391–411.
  • Shoemaker, K., Hobbs, B. P., Bharath, K., Ng, C. S., and Baladandayuthapani, V. (2018). “Tree-based methods for characterizing tumor density heterogeneity.” In Pacific Symposium on Biocomputing, volume 23, 216–227. World Scientific.
  • Stingo, F., Chen, Y., Vannucci, M., Barrier, M., and Mirkes, P. (2010). “A Bayesian graphical modeling approach to microRNA regulatory network inference.” Annals of Applied Statistics, 4(4): 2024–2048.
  • Stingo, F. C., Guindani, M., Vannucci, M., and Calhoun, V. D. (2013). “An integrative Bayesian modeling approach to imaging genetics.” Journal of the American Statistical Association, 108(503): 876–891.
  • Telesca, D., Müller, P., Kornblau, S., Suchard, M., and Ji, Y. (2012). “Modeling protein expression and protein signaling pathways.” Journal of the American Statistical Association, 107(500): 1372–1384.
  • Wang, H. (2012). “Bayesian graphical lasso models and efficient posterior computation.” Bayesian Analysis, 7(2): 771–790.
  • Wang, H. and Li, S. (2012). “Efficient Gaussian graphical model determination under $G$-Wishart prior distributions.” Electronic Journal of Statistics, 6: 168–198.
  • Yuan, M. and Lin, Y. (2007). “Model selection and estimation in the Gaussian graphical model.” Biometrika, 94(1): 19–35.

Supplemental materials

  • Supplementary Material for “Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling”. We include in this file additional theoretical justifications, details of the MCMC updates, as well as some additional results from the applications presented in the paper. Details on the features analysed in the radiomics case study are reported.