Electronic Journal of Statistics

Estimation of multiple networks in Gaussian mixture models

Chen Gao, Yunzhang Zhu, Xiaotong Shen, and Wei Pan

Full-text: Open access

Abstract

We aim to estimate multiple networks in the presence of sample heterogeneity, where the independent samples (i.e. observations) may come from different and unknown populations or distributions. Specifically, we consider penalized estimation of multiple precision matrices in the framework of a Gaussian mixture model. A major innovation is to take advantage of the commonalities across the multiple precision matrices through possibly nonconvex fusion regularization, which for example makes it possible to achieve simultaneous discovery of unknown disease subtypes and detection of differential gene (dys)regulations in functional genomics. We embed in the EM algorithm one of two recently proposed methods for estimating multiple precision matrices in Gaussian graphical models. We demonstrate the feasibility and potential usefulness of the proposed methods in an application to glioblastoma subtype discovery and differential gene network analysis with a microarray gene expression data set. We also conduct realistic simulation studies to evaluate and compare the performance of various methods.

Article information

Source
Electron. J. Statist., Volume 10, Number 1 (2016), 1133-1154.

Dates
Received: March 2015
First available in Project Euclid: 2 May 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1462192266

Digital Object Identifier
doi:10.1214/16-EJS1135

Mathematical Reviews number (MathSciNet)
MR3499523

Zentralblatt MATH identifier
1335.62098

Keywords
Disease subtype discovery Gaussian graphical model model-based clustering non-convex penalty glioblastoma gene expression

Citation

Gao, Chen; Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei. Estimation of multiple networks in Gaussian mixture models. Electron. J. Statist. 10 (2016), no. 1, 1133--1154. doi:10.1214/16-EJS1135. https://projecteuclid.org/euclid.ejs/1462192266


Export citation

References

  • [1] Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers., Foundations and Trends in Machine Learning, 3(1), 1–122.
  • [2] Brennan, C. W., Verhaak, R. G., McKenna, A., Campos, B., Noushmehr, H., Salama, S. R., Zheng, S., Chakravarty, D., Sanborn, J. Z., Berman, S. H., et al. (2013). The somatic genomic landscape of glioblastoma. Cell, 155(2), 462–477.
  • [3] Cantley, L. C. and Neel, B. G. (1999). New insights into tumor suppression: PTEN suppresses tumor formation by restraining the phosphoinositide 3-kinase/AKT pathway., Proceedings of the National Academy of Sciences, 96(8), 4240–4245.
  • [4] Danaher, P., Wang, P., and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes., Journal of the Royal Statistical Society, Series B, 76(2), 373–397.
  • [5] de Souto, M. C., Costa, I. G., de Araujo, D. S., Ludermir, T. B., and Schliep, A. (2008). Clustering cancer gene expression data: a comparative study., BMC Bioinformatics, 9(1), 497.
  • [6] Dempster, A. P., Laird, N. M., Rubin, D. B., et al. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
  • [7] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data., Journal of Multivariate Analysis, 90(1), 196–212.
  • [8] Fraley, C. and Raftery, A.E. (2006). MCLUST version 3 for R: normal mixture modeling and model-based clustering. Technical Report no. 504, Department of Statistics, University of, Washington.
  • [9] Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models., Science, 305(5659), 799–805.
  • [10] Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics, 9(3), 432–441.
  • [11] Guo, J., Levina, E., Michailidis, G., Zhu, J. (2011). Joint estimation of multiple graphical models., Biometrika, 98, 1–15.
  • [12] Hill, S.M., and Mukherjee, S. (2013). Network-based clustering with mixtures of L1-penalized Gaussian graphical models: an empirical investigation., http://arxiv.org/abs/1301.2194.
  • [13] Huang, S., Li, J., Sun, L., Ye, J., Fleisher, A., Wu, T., Chen, K., Reiman, E. and Alzheimer’s Disease NeuroImaging Initiative (2010). Learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation., Neuroimage, 50(3), 935–949.
  • [14] Kerr, G., Ruskin, H. J., Crane, M., and Doolan, P. (2008). Techniques for clustering gene expression data., Computers in Biology and Medicine, 38(3), 283–293.
  • [15] Kolar, M., Liu, H. and Xing, E. P. (2014). Graph estimation from multi-attribute data., Journal of Machine Learning Research, 15(1), 1713–1750.
  • [16] Liu, X. and Ling, Z. Q. (2015). Role of isocitrate dehydrogenase 1/2 (IDH 1/2) gene mutations in human tumors, Histology and Histopathology, 30(10), 1155–1160.
  • [17] McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture., Applied Statistics, 318–324.
  • [18] McLachlan, G., and Peel, D. (2001)., Finite Mixture Models, Wiley.
  • [19] McLendon, R., Friedman, A., Bigner, D., Van Meir, E. G., Brat, D. J., Mastrogianakis, G. M., Olson, J. J., Mikkelsen, T., Lehman, N., Aldape, K., et al. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216), 1061–1068.
  • [20] Mirzaa, G., Parry, D. A., Fry, A. E., Giamanco, K. A., Schwartzentruber, J., Vanstone, M., Logan, C. V., Roberts, N., Johnson, C. A., Singh, S. and Kholmanskikh, S. S. (2014)., De novo CCND2 mutations leading to stabilization of cyclin D2 cause megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome. Nature Genetics, 46(5), 510.
  • [21] Mohan, K., London, P., Fazel, M., Witten, D., and Lee, S. I. (2014). Node-based learning of multiple gaussian graphical models., The Journal of Machine Learning Research, 15(1), 445–488.
  • [22] Narita, Y., Nagane, M., Mishima, K., Huang, H. S., Furnari, F. B. and Cavenee, W. K. (2002). Mutant epidermal growth factor receptor signaling down-regulates p27 through activation of the phosphatidylinositol 3-kinase/Akt pathway in glioblastomas., Cancer Research, 62(22), 6764–6769.
  • [23] Pan, W. and Shen, X. (2007). Penalized model-based clustering with application to variable selection., Journal of Machine Learning Research, 8, 1145–1164.
  • [24] Peterson, C., Stingo, F. C. and Vannucci, M. (2015). Bayesian inference of multiple Gaussian graphical models., Journal of the American Statistical Association, 110(509), 159–174.
  • [25] Qiu, H., Han, F., Liu, H. and Caffo, B. (2015). Joint estimation of multiple graphical models from high dimensional time series., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(2), 487–504.
  • [26] Reynolds, J. H. and Templin, W. D. (2004). Comparing mixture estimates by parametric bootstrapping likelihood ratios., Journal of Agricultural, Biological, and Environmental Statistics, 9(1), 57–74.
  • [27] Rozenblatt-Rosen, O., Mosonego-Ornan, E., Sadot, E., Madar-Shapiro, L., Sheinin, Y., Ginsberg, D., and Yayon, A. (2002). Induction of chondrocyte growth arrest by fgf: transcriptional and cytoskeletal alterations., Journal of Cell Science, 115(3), 553–562.
  • [28] Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation., Journal of the American Statistical Association, 107(497), 223–232.
  • [29] Snuderl, M., Triscott, J., Northcott, P. A., Shih, H. A., Kong, E., Robinson, H., Dunn, S. E., Iafrate, A. J. and Yip, S. (2015). Deep sequencing identifies IDH1 R132S mutation in adult medulloblastoma., Journal of Clinical Oncology, 33(6), 27–31.
  • [30] Telesca, D., Müller, P., Kornblau, S. M., Suchard, M. A. and Ji, Y., 2012. (2012). Modeling protein expression and protein signaling pathways., Journal of the American Statistical Association, 107(500), 1372–1384.
  • [31] Thalamuthu, A., Mukhopadhyay, I., Zheng, X., and Tseng, G. C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis., Bioinformatics, 22(19), 2405–2412.
  • [32] Turkalp, Z., Karamchandani, J. and Das, S. (2014). IDH mutation in glioma: new insights and promises for the future., JAMA neurology. 71(10), 1319–1325.
  • [33] Verhaak, R. G., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., Miller, C. R., Ding, L., Golub, T., Mesirov, J. P., et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell, 17(1), 98–110.
  • [34] Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to microarray data., Biometrics, 64, 440–448.
  • [35] Wu M-Y, Dai D-Q, Zhang X-F, Zhu Y (2013). Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm., PLoS ONE, 8(6), e66256.
  • [36] Xie, B., Pan, W., Shen, X. (2008). Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables., Electronic Journal of Statistics, 2, 168–212.
  • [37] Zhou, H., Pan, W., and Shen, X. (2009). Penalized model-based clustering with unconstrained covariance matrices., Electronic Journal of Statistics, 3, 1473.
  • [38] Zhu, Y., Shen, X., and Pan, W. (2014). Structural pursuit over multiple undirected graphs., Journal of the American Statistical Association, 109, 1683–1696.