Annals of Statistics

Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors

Kyoungjae Lee, Jaeyong Lee, and Lizhen Lin

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In this paper we study the high-dimensional sparse directed acyclic graph (DAG) models under the empirical sparse Cholesky prior. Among our results, strong model selection consistency or graph selection consistency is obtained under more general conditions than those in the existing literature. Compared to Cao, Khare and Ghosh [Ann. Statist. (2019) 47 319–348], the required conditions are weakened in terms of the dimensionality, sparsity and lower bound of the nonzero elements in the Cholesky factor. Furthermore, our result does not require the irrepresentable condition, which is necessary for Lasso-type methods. We also derive the posterior convergence rates for precision matrices and Cholesky factors with respect to various matrix norms. The obtained posterior convergence rates are the fastest among those of the existing Bayesian approaches. In particular, we prove that our posterior convergence rates for Cholesky factors are the minimax or at least nearly minimax depending on the relative size of true sparseness for the entire dimension. The simulation study confirms that the proposed method outperforms the competing methods.

Article information

Ann. Statist., Volume 47, Number 6 (2019), 3413-3437.

Received: February 2018
Revised: October 2018
First available in Project Euclid: 31 October 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C20: Minimax procedures
Secondary: 62F15: Bayesian inference 62C12: Empirical decision procedures; empirical Bayes procedures

DAG model precision matrix Cholesky factor posterior convergence rate strong model selection consistency


Lee, Kyoungjae; Lee, Jaeyong; Lin, Lizhen. Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors. Ann. Statist. 47 (2019), no. 6, 3413--3437. doi:10.1214/18-AOS1783.

Export citation


  • Banerjee, S. and Ghosal, S. (2014). Posterior convergence rates for estimating large precision matrices using graphical models. Electron. J. Stat. 8 2111–2137.
  • Banerjee, S. and Ghosal, S. (2015). Bayesian structure learning in graphical models. J. Multivariate Anal. 136 147–162.
  • Ben-David, E., Li, T., Massam, H. and Rajaratnam, B. (2015). High dimensional Bayesian inference for Gaussian directed acyclic graph models. Available at arXiv:1109.4371v5.
  • Bhattacharya, A., Pati, D. and Yang, Y. (2019). Bayesian fractional posteriors. Ann. Statist.. 47 39–66.
  • Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Springer, Heidelberg.
  • Cai, T. T., Liu, W. and Zhou, H. H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Statist. 44 455–488.
  • Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781–815.
  • Cai, T. T. and Yuan, M. (2012). Adaptive covariance matrix estimation through block thresholding. Ann. Statist. 40 2014–2042.
  • Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • Cai, T. T. and Zhou, H. H. (2012a). Minimax estimation of large covariance matrices under $\ell_{1}$-norm. Statist. Sinica 22 1319–1349.
  • Cai, T. T. and Zhou, H. H. (2012b). Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist. 40 2389–2420.
  • Cao, X., Khare, K. and Ghosh, M. (2019). Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models. Ann. Statist. 47 319–348.
  • Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986–2018.
  • Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186–197.
  • Gao, C. and Zhou, H. H. (2015). Rate-optimal posterior contraction for sparse PCA. Ann. Statist. 43 785–818.
  • Grünwald, P. and van Ommen, T. (2017). Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Anal. 12 1069–1103.
  • Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
  • Khare, K., Oh, S., Rahman, S. and Rajaratnam, B. (2016). A convex framework for high-dimensional sparse Cholesky based covariance estimation. Preprint. Available at arxiv:1610.02436.
  • Lee, K. and Lee, J. (2017). Estimating large precision matrices via modified cholesky decomposition. Available at arXiv:1707.01143.
  • Lee, K. and Lee, J. (2018). Optimal Bayesian minimax rates for unconstrained large covariance matrices. Bayesian Anal. 13 1211–1229.
  • Lee, K., Lee, J. and Lin, L. (2019). Supplement to “Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors.” DOI:10.1214/18-AOS1783SUPP.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of $g$ priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
  • Martin, R., Mess, R. and Walker, S. G. (2017). Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23 1822–1847.
  • Martin, R. and Walker, S. G. (2014). Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector. Electron. J. Stat. 8 2188–2206.
  • Miller, J. W. and Dunson, D. B. (2018). Robust Bayesian inference via coarsening. J. Amer. Statist. Assoc. DOI:10.1080/01621459.2018.1469995.
  • Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
  • Pati, D., Bhattacharya, A., Pillai, N. S. and Dunson, D. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Statist. 42 1102–1130.
  • Reid, S., Tibshirani, R. and Friedman, J. (2016). A study of error variance estimation in Lasso regression. Statist. Sinica 26 35–67.
  • Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991–1026.
  • Rothman, A. J., Levina, E. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97 539–550.
  • Roverato, A. (2000). Cholesky decomposition of a hyper inverse Wishart matrix. Biometrika 87 99–112.
  • Rütimann, P. and Bühlmann, P. (2009). High dimensional sparse covariance estimation via directed acyclic graphs. Electron. J. Stat. 3 1133–1160.
  • Shang, Z. and Clayton, M. K. (2011). Consistency of Bayesian linear model selection with a growing number of parameters. J. Statist. Plann. Inference 141 3463–3474.
  • Shin, M., Bhattacharya, A. and Johnson, V. E. (2018). Scalable Bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statist. Sinica 28 1053–1078.
  • Shojaie, A. and Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97 519–538.
  • Syring, N. A. and Martin, R. (2016). Scaling the Gibbs Posterior Credible Regions. Preprint. Available at arxiv:1509.00922.
  • van de Geer, S. and Bühlmann, P. (2013). $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 536–567.
  • Wainwright, M. J. (2009a). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728–5741.
  • Wainwright, M. J. (2009b). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • Walker, S. and Hjort, N. L. (2001). On Bayesian consistency. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 811–821.
  • Xiang, R., Khare, K. and Ghosh, M. (2015). High dimensional posterior convergence rates for decomposable graphical models. Electron. J. Stat. 9 2828–2854.
  • Yang, Y., Wainwright, M. J. and Jordan, M. I. (2016). On the computational complexity of high-dimensional Bayesian variable selection. Ann. Statist. 44 2497–2532.
  • Yu, G. and Bien, J. (2017). Learning local dependence in ordered data. J. Mach. Learn. Res. 18 42.
  • Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In Bayesian Inference and Decision Techniques. Stud. Bayesian Econometrics Statist. 6 233–243. North-Holland, Amsterdam.

Supplemental materials

  • Minimax Posterior Convergence Rates and Model Selection Consistency in High-dimensional DAG Models based on Sparse Cholesky Factors. We present the proofs for the main results and other auxiliary results.