Electronic Journal of Statistics

Estimation of high-dimensional graphical models using regularized score matching

Lina Lin, Mathias Drton, and Ali Shojaie

Full-text: Open access


Graphical models are widely used to model stochastic dependences among large collections of variables. We introduce a new method of estimating undirected conditional independence graphs based on the score matching loss, introduced by Hyvärinen (2005), and subsequently extended in Hyvärinen (2007). The regularized score matching method we propose applies to settings with continuous observations and allows for computationally efficient treatment of possibly non-Gaussian exponential family models. In the well-explored Gaussian setting, regularized score matching avoids issues of asymmetry that arise when applying the technique of neighborhood selection, and compared to existing methods that directly yield symmetric estimates, the score matching approach has the advantage that the considered loss is quadratic and gives piecewise linear solution paths under $\ell_{1}$ regularization. Under suitable irrepresentability conditions, we show that $\ell_{1}$-regularized score matching is consistent for graph estimation in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of-the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models.

Article information

Electron. J. Statist., Volume 10, Number 1 (2016), 806-854.

Received: September 2015
First available in Project Euclid: 6 April 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation
Secondary: 62F12: Asymptotic properties of estimators

Conditional independence graph exponential family graphical model high-dimensional statistics score matching sparsity


Lin, Lina; Drton, Mathias; Shojaie, Ali. Estimation of high-dimensional graphical models using regularized score matching. Electron. J. Statist. 10 (2016), no. 1, 806--854. doi:10.1214/16-EJS1126. https://projecteuclid.org/euclid.ejs/1459967424

Export citation


  • Ahonen, T. J., Xie, J., LeBaron, M. J., Zhu, J., Nurmi, M., Alanen, K., Rui, H. and Nevalainen, M. T. (2003). Inhibition of transcription factor Stat5 induces cell death of human prostate cancer cells., Journal of Biological Chemistry 278 27287–27292.
  • Albert, R. (2005). Scale-free networks in cell biology., Journal of Cell Science 118 4947–4957.
  • Allen, G. I. and Liu, Z. (2013). A local Poisson graphical model for inferring networks from sequencing data., IEEE Trans. NanoBioscience 12 189–198.
  • Arnold, B. C., Castillo, E. and Sarabia, J. M. (1999)., Conditional specification of statistical models. Springer-Verlag, New York.
  • Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks., Science 286 509–512.
  • Barber, R. F. and Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria., Electron. J. Stat. 9 567–607.
  • Bühlmann, P. and van de Geer, S. (2011)., Statistics for high-dimensional data. Springer, Heidelberg.
  • Carbery, A. and Wright, J. (2001). Distributional and $L^q$ norm inequalities for polynomials over convex bodies in $\mathbbR^n$., Math. Res. Lett. 8 233–248.
  • Carter, S. L., Brechbühler, C. M., Griffin, M. and Bond, A. T. (2004). Gene co-expression network topology provides a framework for molecular characterization of cellular state., Bioinformatics 20 2242–2250.
  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criterion for model selection with large model space., Biometrika 95 759–771.
  • Chichignoud, M., Lederer, J. and Wainwright, M. (2014). Tuning Lasso for sup-norm optimality., arXiv:1410.0247.
  • Dawid, A. P. and Musio, M. (2013). Estimation of spatial processes using local scoring rules., AStA Adv. Stat. Anal 97 173–179.
  • Defazio, A. and Caetano, T. S. (2012). A convex formulation for learning scale-free networks via submodular relaxation., Adv. Neural Inf. Process. Syst. 1250–1258.
  • Dempster, A. P. (1972). Covariance selection., Biometrics 157–175.
  • Dobra, A. and Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data., Ann. Appl. Stat. 5 969–993.
  • Drton, M. and Perlman, M. D. (2007). Multiple testing and error control in Gaussian graphical model selection., Statist. Sci. 22 430–449.
  • Edwards, D. (2000)., Introduction to graphical modelling, Second ed. Springer-Verlag, New York.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression., Ann. Statist. 32 407–499. With discussion, and a rejoinder by the authors.
  • Fan, S., Meng, Q., Auborn, K., Carter, T. and Rosen, E. (2006). BRCA1 and BRCA2 as molecular targets for phytochemicals indole-3-carbinol and genistein in breast and prostate cancer cells., Brit. J. Cancer 94 407–426.
  • Fellinghauer, B., Bühlmann, P., Ryffel, M., von Rhein, M. and Reinhardt, J. D. (2013). Stable graphical model estimation with random forests for discrete, continuous, and mixed variables., Comput. Statist. Data Anal. 64 132–152.
  • Finegold, M. and Drton, M. (2011). Robust graphical modeling of gene networks using classical and alternative $t$-distributions., Ann. Appl. Stat. 5 1057–1080.
  • Forbes, P. G. M. and Lauritzen, S. (2015). Linear estimating equations for exponential families with application to Gaussian linear concentration models., Linear Algebra Appl. 473 261–283.
  • Foygel, R. and Drton, M. (2010a). Exact block-wise optimization in group lasso for linear regression., arXiv:1010.3320.
  • Foygel, R. and Drton, M. (2010b). Extended Bayesian information criteria for Gaussian graphical models., Adv. Neural Inf. Process. Syst. 23 2020–2028.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Applications of the lasso and grouped lasso to the estimation of sparse graphical models Technical Report, Stanford, University.
  • Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization., Ann. Appl. Stat. 1 302–332.
  • Gao, X., Pu, D. Q., Wu, Y. and Xu, H. (2012). Tuning parameter selection for penalized likelihood estimation of Gaussian graphical model., Statist. Sinica 22 1123–1146.
  • Gayther, S. A., de Foy, K. A., Harrington, P., Pharoah, P., Dunsmuir, W. D., Edwards, S. M., Gillett, C., Ardern-Jones, A., Dearnaley, D. P., Easton, D. F. et al. (2000). The frequency of germ-line mutations in the breast cancer predisposition genes BRCA1 and BRCA2 in familial prostate cancer., Cancer Res. 60 4513–4518.
  • Gelman, A. and Meng, X.-L. (1991). A note on bivariate distributions that are conditionally normal., Amer. Statist. 45 125–126.
  • Gu, L., Vogiatzi, P., Puhr, M., Dagvadorj, A., Lutz, J., Ryder, A., Addya, S., Fortina, P., Cooper, C., Leiby, B. et al. (2010). Stat5 promotes metastatic behavior of human prostate cancer cells in vitro and in vivo., Endocr. Relat. Cancer 17 481–493.
  • Han, J.-D. J., Bertin, N., Hao, T., Goldberg, D. S., Berriz, G. F., Zhang, L. V., Dupuy, D., Walhout, A. J., Cusick, M. E., Roth, F. P. et al. (2004). Evidence for dynamically organized modularity in the yeast protein–protein interaction network., Nature 430 88–93.
  • Höfling, H. and Tibshirani, R. J. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods., J. Mach. Learn. Res. 10 883–906.
  • Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching., J. Mach. Learn. Res. 6 695–709.
  • Hyvärinen, A. (2007). Some extensions of score matching., Comput. Statist. Data Anal. 51 2499–2512.
  • Jalali, A., Ravikumar, P. D., Vasuki, V. and Sanghavi, S. (2011). On learning discrete graphical models using group-sparse regularization. In, AISTATS 2011 378–387.
  • Jeong, H., Mason, S. P., Barabási, A.-L. and Oltvai, Z. N. (2001). Lethality and centrality in protein networks., Nature 411 41–42.
  • Khare, K., Oh, S.-Y. and Rajaratnam, B. (2015). A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees., J. Roy. Statist. Soc. Ser. B 77 803–825.
  • Kingma, D. P. and LeCun, Y. (2010). Regularized estimation of image statistics by score matching. In, Adv. Neural Inf. Process. Syst. 1126–1134.
  • Kishi, H., Igawa, M., Kikuno, N., Yoshino, T., Urakami, S. and Shiina, H. (2004). Expression of the survivin gene in prostate cancer: correlation with clinicopathological characteristics, proliferative activity and apoptosis., J. Urology 171 1855–1860.
  • Köster, U. and Hyvärinen, A. (2007). A two-layer ICA-like model estimated by score matching. In, ICANN 2007 798–807. Springer.
  • Lauritzen, S. L. (1996)., Graphical models 17. Oxford University Press.
  • Le, Q. V., Karpenko, A., Ngiam, J. and Ng, A. Y. (2011). ICA with reconstruction cost for efficient overcomplete feature learning. In, Adv. Neural Inf. Process. Syst. 1017–1025.
  • Leclerc, R. D. (2008). Survival of the sparsest: robust gene networks are parsimonious., Mol. Syst. Biol. 4 213.
  • Lee, S.-I., Ganapathi, V. and Koller, D. (2007). Efficient structure learning of Markov networks using $\ell_1$-regularization. In, Advances in Neural Information Processing Systems 19 (B. Schölkopf, J. C. Platt and T. Hoffman, eds.) 817–824. MIT Press.
  • Lin, L., Drton, M. and Shojaie, A. (2016). Supplement to “Estimation of high-dimensional graphical models using regularized score, matching”.
  • Liu, H., Han, F. and Zhang, C.-h. (2012). Transelliptical graphical models. In, Adv. Neural Inf. Process. Syst. 809–817.
  • Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: semiparametric estimation of high dimensional undirected graphs., J. Mach. Learn. Res. 10 2295–2328.
  • Liu, H., Roeder, K. and Wasserman, L. (2010). Stability approach to regularization selection (StARS) for high dimensional graphical models. In, Adv. Neural Inf. Process. Syst. 1432–1440.
  • Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models., Ann. Statist. 40 2293–2326.
  • Loh, P.-L. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity., Ann. Statist. 40 1637–1664.
  • Meinshausen, N. (2008). A note on the Lasso for Gaussian graphical model selection., Statist. Probab. Lett. 78 880–884.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso., Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection., J. Roy. Statist. Soc. Ser. B 72 417–473.
  • Mitra, A., Fisher, C., Foster, C., Jameson, C., Barbachanno, Y., Bartlett, J., Bancroft, E., Doherty, R., Kote-Jarai, Z., Peock, S. et al. (2008). Prostate cancer in male BRCA1 and BRCA2 mutation carriers has a more aggressive phenotype., Brit. J. Cancer 98 502–507.
  • Miyamura, M. and Kano, Y. (2006). Robust Gaussian graphical modeling., J. Multivariate Anal. 97 1525–1550.
  • Moser, C., Ruemmele, P., Gehmert, S., Schenk, H., Kreutz, M. P., Mycielska, M. E., Hackl, C., Kroemer, A., Schnitzbauer, A. A., Stoeltzing, O. et al. (2012). STAT5b as molecular target in pancreatic cancer—Inhibition of tumor growth, angiogenesis, and metastases., Neoplasia 14 915–IN12.
  • Okamoto, M. (1973). Distinctness of the eigenvalues of a quadratic form in a multivariate sample., Ann. Statist. 1 763–765.
  • Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models., J. Amer. Statist. Assoc. 104 735–746.
  • Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_1$-regularized logistic regression., Ann. Statist. 38 1287–1319.
  • Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence., Electron. J. Stat. 5 935–980.
  • Rocha, G. V., Zhao, P. and Yu, B. (2008). A path following algorithm for sparse pseudo-likelihood inverse covariance estimation (SPLICE) Technical Report, University of California, Berkeley.
  • Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths., Ann. Statist. 35 1012–1030.
  • Roth, V. and Fischer, B. (2008). The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In, ICML 848–855.
  • Schwarz, G. E. (1978). Estimating the dimension of a model., Ann. Statist. 6 461–464.
  • Shah, R. D. and Samworth, R. J. (2013). Variable selection with error control: another look at stability selection., J. Roy. Statist. Soc. Ser. B 75 55–80.
  • Shariat, S. F., Lotan, Y., Saboorian, H., Khoddami, S. M., Roehrborn, C. G., Slawin, K. M. and Ashfaq, R. (2004). Survivin expression is associated with features of biologically aggressive prostate carcinoma., Cancer 100 751–757.
  • Shojaie, A. and Sedaghat, N. (2016). How similar are estimated networks of different cancer subtypes? In, Big and Complex Data Analysis: Statistical Methodologies and Applications (S. E. Ahmed, ed.) Springer, New York.
  • Sun, H. and Li, H. (2012). Robust Gaussian graphical modeling via $\ell_1$ penalization., Biometrics 68 1197–1206.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R. J. (2013). The lasso problem and uniqueness., Electron. J. Stat. 7 1456–1490.
  • Tryggvadóttir, L., Vidarsdóttir, L., Thorgeirsson, T., Jonasson, J. G., Ólafsdóttir, E. J., Ólafsdóttir, G. H., Rafnar, T., Thorlacius, S., Jonsson, E., Eyfjord, J. E. et al. (2007). Prostate cancer progression and survival in BRCA2 mutation carriers., Journal of the National Cancer Institute 99 929–935.
  • Tseng, P. (2001). Convergence of a block coordinate descent method for non-differentiable minimization., J. Optim. Theory Appl. 109 475–494.
  • Vincent, P. (2011). A connection between score matching and denoising autoencoders., Neural Comput. 23 1661–1674.
  • Vogel, D. and Fried, R. (2011). Elliptical graphical modelling., Biometrika 98 935–951.
  • Voorman, A., Shojaie, A. and Witten, D. (2014). Graph estimation with joint additive models., Biometrika 101 85–101.
  • Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso)., IEEE Trans. Inform. Theory 55 2183–2202.
  • Wang, H., Sun, D., Ji, P., Mohler, J. and Zhu, L. (2008). An AR-Skp2 pathway for proliferation of androgen-dependent prostate-cancer cells., Journal of Cell Science 121 2578–2587.
  • Wang, Z., Gao, D., Fukushima, H., Inuzuka, H., Liu, P., Wan, L., Sarkar, F. H. and Wei, W. (2012). Skp2: a novel potential therapeutic target for prostate cancer., Biochimica et Biophysica Acta (BBA)-Reviews on Cancer 1825 11–17.
  • Wu, Z., Cho, H., Hampton, G. M. and Theodorescu, D. (2009). Cdc6 and cyclin E2 are PTEN-regulated genes associated with human prostate cancer metastasis., Neoplasia 11 66–76.
  • Yang, G., Ayala, G., De Marzo, A., Tian, W., Frolov, A., Wheeler, T. M., Thompson, T. C. and Harper, J. W. (2002). Elevated Skp2 protein expression in human prostate cancer: association with loss of the cyclin-dependent kinase inhibitor p27 and PTEN and with reduced recurrence-free survival., Clinical Cancer Research 8 3419–3426.
  • Yang, E., Allen, G., Liu, Z. and Ravikumar, P. K. (2012). Graphical models via generalized linear models. In, Adv. Neural Inf. Process. Syst. 1358–1366.
  • Yang, E., Ravikumar, P., Allen, G. I. and Liu, Z. (2013). On graphical models via univariate exponential family distributions., arXiv:1301.4183.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. Roy. Statist. Soc. Ser. B 68 49–67.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model., Biometrika 94(10) 19–35.

Supplemental materials