The Annals of Statistics

Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models

Xuan Cao, Kshitij Khare, and Malay Ghosh

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Covariance estimation and selection for high-dimensional multivariate datasets is a fundamental problem in modern statistics. Gaussian directed acyclic graph (DAG) models are a popular class of models used for this purpose. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independence assumptions on the underlying variables. A variety of priors have been developed in recent years for Bayesian inference in DAG models, yet crucial convergence and sparsity selection properties for these models have not been thoroughly investigated. Most of these priors are adaptations/generalizations of the Wishart distribution in the DAG context. In this paper, we consider a flexible and general class of these “DAG-Wishart” priors with multiple shape parameters. Under mild regularity assumptions, we establish strong graph selection consistency and establish posterior convergence rates for estimation when the number of variables $p$ is allowed to grow at an appropriate subexponential rate with the sample size $n$.

Article information

Ann. Statist., Volume 47, Number 1 (2019), 319-348.

Received: May 2017
Revised: January 2018
First available in Project Euclid: 30 November 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference
Secondary: 62G20: Asymptotic properties

Posterior consistency high-dimensional data Bayesian DAG models covariance estimation graph selection


Cao, Xuan; Khare, Kshitij; Ghosh, Malay. Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models. Ann. Statist. 47 (2019), no. 1, 319--348. doi:10.1214/18-AOS1689.

Export citation


  • [1] Altomare, D., Consonni, G. and La Rocca, L. (2013). Objective Bayesian search of Gaussian directed acyclic graphical models for ordered variables with non-local priors. Biometrics 69 478–487.
  • [2] Aragam, B., Amini, A. and Zhou, Q. (2015). Learning directed acyclic graphs with penalized neighbourhood regression. Available at
  • [3] Banerjee, S. and Ghosal, S. (2014). Posterior convergence rates for estimating large precision matrices using graphical models. Electron. J. Stat. 8 2111–2137.
  • [4] Banerjee, S. and Ghosal, S. (2015). Bayesian structure learning in graphical models. J. Multivariate Anal. 136 147–162.
  • [5] Ben-David, E., Li, T., Massam, H. and Rajaratnam, B. (2016). High dimensional Bayesian inference for Gaussian directed acyclic graph models. Technical report. Available at
  • [6] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [7] Cao, X., Khare, K. and Ghosh, M. (2019). Supplement to “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” DOI:10.1214/18-AOS1689SUPP.
  • [8] Consonni, G., La Rocca, L. and Peluso, S. (2017). Objective Bayes covariate-adjusted sparse graphical model selection. Scand. J. Stat. 44 741–764.
  • [9] El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
  • [10] Geiger, D. and Heckerman, D. (2002). Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Ann. Statist. 30 1412–1440.
  • [11] Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • [12] Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 143–170.
  • [13] Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649–660.
  • [14] Khare, K., Oh, S., Rahman, S. and Rajaratnam, B. (2017). A convex framework for high-dimensional sparse cholesky based covariance estimation in gaussian dag models. Technical report. Available at
  • [15] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA.
  • [16] Letac, G. and Massam, H. (2007). Wishart distributions for decomposable graphs. Ann. Statist. 35 1278–1323.
  • [17] Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
  • [18] Paulsen, V. I., Power, S. C. and Smith, R. R. (1989). Schur products and matrix completions. J. Funct. Anal. 85 151–178.
  • [19] Pourahmadi, M. (2007). Cholesky decompositions and estimation of a covariance matrix: Orthogonality of variance-correlation parameters. Biometrika 94 1006–1013.
  • [20] Rothman, A. J., Levina, E. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97 539–550.
  • [21] Rudelson, M. and Vershynin, R. (2013). Hanson–Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18 no. 82, 9.
  • [22] Rütimann, P. and Bühlmann, P. (2009). High dimensional sparse covariance estimation via directed acyclic graphs. Electron. J. Stat. 3 1133–1160.
  • [23] Shojaie, A. and Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97 519–538.
  • [24] Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. J. Amer. Statist. Assoc. 97 1141–1153.
  • [25] van de Geer, S. and Bühlmann, P. (2013). $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 536–567.
  • [26] Xiang, R., Khare, K. and Ghosh, M. (2015). High dimensional posterior convergence rates for decomposable graphical models. Electron. J. Stat. 9 2828–2854.
  • [27] Yu, G. and Bien, J. (2017). Learning local dependence in ordered data. J. Mach. Learn. Res. 18 Paper No. 42, 60.

Supplemental materials

  • Supplement to “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models”. This supplemental file contains additional proofs for theorems and technical lemmas.