The Annals of Statistics

Estimating high-dimensional intervention effects from observational data

Marloes H. Maathuis, Markus Kalisch, and Peter Bühlmann

Full-text: Open access


We assume that we have observational data generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identifiable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal effects can be estimated using intervention calculus. In this paper, we combine these two parts. For each DAG in the estimated equivalence class, we use intervention calculus to estimate the causal effects of the covariates on the response. This yields a collection of estimated causal effects for each covariate. We show that the distinct values in this set can be consistently estimated by an algorithm that uses only local information of the graph. This local approach is computationally fast and feasible in high-dimensional problems. We propose to use summary measures of the set of possible causal effects to determine variable importance. In particular, we use the minimum absolute value of this set, since that is a lower bound on the size of the causal effect. We demonstrate the merits of our methods in a simulation study and on a data set about riboflavin production.

Article information

Ann. Statist., Volume 37, Number 6A (2009), 3133-3164.

First available in Project Euclid: 17 August 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62-09: Graphical methods 62H99: None of the above, but in this section

Causal analysis directed acyclic graph (DAG) graphical modeling intervention calculus PC-algorithm sparsity


Maathuis, Marloes H.; Kalisch, Markus; Bühlmann, Peter. Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 (2009), no. 6A, 3133--3164. doi:10.1214/09-AOS685.

Export citation


  • [1] Beeri, C., Fagin, R., Maier, D. and Yannakakis, M. (1983). On the desirability of acyclic database schemes. J. Assoc. Comput. Mach. 30 479–513.
  • [2] Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445–498.
  • [3] Chickering, D. M. (2003). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507–554.
  • [4] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 462–467.
  • [5] Dawid, A. P. (2000). Causal inference without counterfactuals. J. Amer. Statist. Assoc. 95 407–448.
  • [6] Dirac, G. A. (1961). On rigid circuit graphs. Abh. Math. Sem. Univ. Hamburg 25 71–76.
  • [7] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • [8] Freedman, D. A. (2005). On specifying graphical models for causation, and the identification problem. In Identification and Inference for Econometric Models 56–79. Cambridge Univ. Press, Cambridge.
  • [9] Fulkerson, D. R. and Gross, O. A. (1965). Incidence matrices and interval graphs. Pacific J. Math. 15 835–855.
  • [10] Greenland, S., Pearl, J. and Robins, J. (1999). Causal diagrams for epidemiologic research. Epidemiology 10 37–48.
  • [11] Greenland, S., Robins, J. and Pearl, J. (1999). Confounding and collapsibility in causal inference. Statist. Sci. 14 29–46.
  • [12] Heckerman, D., Geiger, D. and Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. J. Mach. Learn. Res. 20 197–243.
  • [13] Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945–970.
  • [14] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
  • [15] Kalisch, M. and Mächler, M. (2008). R-package pcalg: Estimating the skeleton and equivalence class of a dag. Available at
  • [16] Kaufman, J. and Kaufman, S. (2001). Assessment of structured socioeconomic effects on health. Epidemiology 12 157–167.
  • [17] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
  • [18] Lauritzen, S. L. (2001). Causal inference from graphical models. In Complex Stochastic Systems 63–107. Chapman and Hall/CRC, Boca Raton, FL.
  • [19] Marchetti, G. M. and Drton, M. (2006). R-package ggm: Graphical Gaussian models. Available at
  • [20] Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence 403–418. Morgan Kaufmann, San Francisco, CA.
  • [21] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [22] Meinshausen, N. and Bühlmann, P. (2008). Stability selection. Preprint. Available at arXiv:0809.2932v1.
  • [23] Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82 669–710. With discussion and a rejoinder by the author.
  • [24] Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge Univ. Press, Cambridge.
  • [25] Pearl, J. (2003). Statistics and causal inference: A review. Test 12 281–318.
  • [26] Richardson, T. S. and Spirtes, P. (2002). Ancestral graph Markov models. Ann. Statist. 30 962–1030.
  • [27] Richardson, T. S. and Spirtes, P. (2003). Causal inference via ancestral graph models. In Highly Structured Stochastic Systems. Oxford Statistical Science Series 27 83–113. Oxford Univ. Press, Oxford.
  • [28] Robins, J. M., Scheines, R., Spirtes, P. and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika 90 491–515.
  • [29] Shimizu, S., Hoyer, P. O., Hyvärinen, A. and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7 2003–2030.
  • [30] Spiegelhalter, D. J., Dawid, A. P., Lauritzen, S. L. and Cowell, R. G. (1993). Bayesian analysis in expert systems. Statist. Sci. 8 219–283.
  • [31] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
  • [32] Van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • [33] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence 220–227. Morgan Kaufmann, San Francisco, CA.
  • [34] Zhang, J. (2008). Causal reasoning with ancestral graphs. J. Mach. Learn. Res. 9 1437–1474.
  • [35] Zhang, J. and Spirtes, P. (2003). Strong faithfulness and uniform consistency in causal inference. In Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence 632–639. Morgan Kaufmann, San Francisco, CA.
  • [36] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.