The Annals of Statistics

Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions

Abstract

Causal inference is known to be very challenging when only observational data are available. Randomized experiments are often costly and impractical and in instrumental variable regression the number of instruments has to exceed the number of causal predictors. It was recently shown in Peters, Bühlmann and Meinshausen (2016) (J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 947–1012) that causal inference for the full model is possible when data from distinct observational environments are available, exploiting that the conditional distribution of a response variable is invariant under the correct causal model. Two shortcomings of such an approach are the high computational effort for large-scale data and the assumed absence of hidden confounders. Here, we show that these two shortcomings can be addressed if one is willing to make a more restrictive assumption on the type of interventions that generate different environments. Thereby, we look at a different notion of invariance, namely inner-product invariance. By avoiding a computationally cumbersome reverse-engineering approach such as in Peters, Bühlmann and Meinshausen (2016), it allows for large-scale causal inference in linear structural equation models. We discuss identifiability conditions for the causal parameter and derive asymptotic confidence intervals in the low-dimensional setting. In the case of nonidentifiability, we show that the solution set of causal Dantzig has predictive guarantees under certain interventions. We derive finite-sample bounds in the high-dimensional setting and investigate its performance on simulated datasets.

Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1688-1722.

Dates
Received: June 2017
Revised: April 2018
First available in Project Euclid: 13 February 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1550026854

Digital Object Identifier
doi:10.1214/18-AOS1732

Mathematical Reviews number (MathSciNet)
MR3911127

Zentralblatt MATH identifier
07053523

Citation

Rothenhäusler, Dominik; Bühlmann, Peter; Meinshausen, Nicolai. Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions. Ann. Statist. 47 (2019), no. 3, 1688--1722. doi:10.1214/18-AOS1732. https://projecteuclid.org/euclid.aos/1550026854

References

• Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Ann. Statist. 1 135–141.
• Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505–541.
• Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
• Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley, New York.
• Bowden, R. J. and Turkington, D. A. (1990). Instrumental Variables. Cambridge Univ. Press, Cambridge. Reprint of the 1984 original.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• Chickering, D. M. (2003). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507–554.
• Didelez, V., Meng, S. and Sheehan, N. A. (2010). Assumptions of IV methods for observational epidemiology. Statist. Sci. 25 22–40.
• Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J. Mach. Learn. Res. 13 2409–2464.
• Hauser, A. and Bühlmann, P. (2015). Jointly interventional and observational data: Estimation of interventional Markov equivalence classes of directed acyclic graphs. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 291–318.
• Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J. and Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. Adv. Neural Inf. Process. Syst. 21 689–696.
• Kemmeren, P., Sameith, K., van de Pasch, L. A., Benschop, J. J., Lenstra, T. L., Margaritis, T., O’Duibhir, E., Apweiler, E., van Wageningen, S. et al. (2014). Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors. Cell 157 740–752.
• Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. J. Bus. Econom. Statist. 30 67–80.
• Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133–3164.
• Meinshausen, N., Hauser, A., Mooij, J. M., Peters, J., Versteeg, P. and Bühlmann, P. (2016). Methods for causal inference from gene perturbation experiments and validation. Proc. Natl. Acad. Sci. USA 113 7361–7368.
• Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. [Translated and edited by D. Dabrowska and T. Speed] Statist. Sci. 5 (1990) 465–480.
• Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge.
• Peters, J., Bühlmann, P. and Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 947–1012. With comments and a rejoinder.
• R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
• Richardson, T. and Robins, J. M. (2013). Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Working Paper 128, Center for the Statistics and the Social Sciences, Univ. Washington Series.
• Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.
• Rothenhäusler, D., Bühlmann, P. and Meinshausen, N. (2019). Supplement to “Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions.” DOI:10.1214/18-AOS1732SUPP.
• Rothenhäusler, D., Heinze, C., Peters, J. and Meinshausen, N. (2015). Backshift: Learning causal cyclic graphs from unknown shift interventions. Adv. Neural Inf. Process. Syst. 29 1513–1521.
• Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688.
• Shimizu, S., Hoyer, P. O., Hyvärinen, A. and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7 2003–2030.
• Tian, J. and Pearl, J. (2001). Causal discovery from changes. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI) 512–522.
• VanderWeele, T. J. and Robins, J. M. (2010). Signed directed acyclic graphs for causal inference. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 111–127.
• van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• Verma, T. and Pearl, J. (1991). Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence (UAI) 255–270.
• Wald, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11 285–300.
• Wang, L. and Tchetgen Tchetgen, E. (2018). Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 531–550.
• Wright, P. G. (1928). The Tariff on Animal and Vegetable Oils. The Macmillan Company, New York.
• Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.

Supplemental materials

• Supplement to “Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions”. The Supplementary Material contains detailed technical proofs.