The Annals of Statistics

$\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs

Sara van de Geer and Peter Bühlmann

Full-text: Open access


We consider the problem of regularized maximum likelihood estimation for the structure and parameters of a high-dimensional, sparse directed acyclic graphical (DAG) model with Gaussian distribution, or equivalently, of a Gaussian structural equation model. We show that the $\ell_{0}$-penalized maximum likelihood estimator of a DAG has about the same number of edges as the minimal-edge I-MAP (a DAG with minimal number of edges representing the distribution), and that it converges in Frobenius norm. We allow the number of nodes $p$ to be much larger than sample size $n$ but assume a sparsity condition and that any representation of the true DAG has at least a fixed proportion of its nonzero edge weights above the noise level. Our results do not rely on the faithfulness assumption nor on the restrictive strong faithfulness condition which are required for methods based on conditional independence testing such as the PC-algorithm.

Article information

Ann. Statist., Volume 41, Number 2 (2013), 536-567.

First available in Project Euclid: 26 April 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F12: Asymptotic properties of estimators
Secondary: 62F30: Inference under constraints

Causal inference faithfulness condition Gaussian structural equation model graphical modeling high-dimensional inference


van de Geer, Sara; Bühlmann, Peter. $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 (2013), no. 2, 536--567. doi:10.1214/13-AOS1085.

Export citation


  • Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505–541.
  • Bennett, G. (1962). Probability inequalities for sums of independent random variables. J. Amer. Statist. Assoc. 57 33–45.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • Chickering, D. M. (2002). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507–554.
  • Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J. Mach. Learn. Res. 13 2409–2464.
  • Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Clarendon, Oxford.
  • Loh, P. L. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40 1637–1664.
  • Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133–3164.
  • Maathuis, M. H., Colombo, D., Kalisch, M. and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods 7 247–248.
  • Massart, P. (2003). Concentration inequalities and model selection. In Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 623, 2003. Springer, Berlin.
  • Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge Univ. Press, Cambridge.
  • Peters, J. and Bühlmann, P. (2012). Identifiability of Gaussian structural equation models with same error variances. Available at arXiv:1205.2536.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • Robins, J. M., Scheines, R., Spirtes, P. and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika 90 491–515.
  • Shojaie, A. and Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97 519–538.
  • Silander, T. and Myllymäki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence 445–452. AUAI Press, Arlington, VA.
  • Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
  • Uhler, C., Raskutti, G., Bühlmann, P. and Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. Ann. Statist. 41 436–463.
  • van de Geer, S., Bühlmann, P. and Zhou, S. (2011). The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat. 5 688–749.
  • Zhang, J. and Spirtes, P. (2003). Strong faithfulness and uniform consistency in causal inference. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.