The Annals of Statistics

A robust and efficient approach to causal inference based on sparse sufficient dimension reduction

Shujie Ma, Liping Zhu, Zhiwei Zhang, Chih-Ling Tsai, and Raymond J. Carroll

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Most existing methods require specifying certain parametric models involving the outcome, treatment and confounding variables, and employ a variable selection procedure to identify confounders. However, selection of a proper set of confounders depends on correct specification of the working models. The bias due to model misspecification and incorrect selection of confounding variables can yield misleading results. We propose a robust and efficient approach for inference about the average treatment effect via a flexible modeling strategy incorporating penalized variable selection. Specifically, we consider an estimator constructed based on an efficient influence function that involves a propensity score and an outcome regression. We then propose a new sparse sufficient dimension reduction method to estimate these two functions without making restrictive parametric modeling assumptions. The proposed estimator of the average treatment effect is asymptotically normal and semiparametrically efficient without the need for variable selection consistency. The proposed methods are illustrated via simulation studies and a biomedical application.

Article information

Ann. Statist., Volume 47, Number 3 (2019), 1505-1535.

Received: May 2017
Revised: February 2018
First available in Project Euclid: 13 February 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62G10: Hypothesis testing 62G20: Asymptotic properties 62J07: Ridge regression; shrinkage estimators

Average treatment effect dimension reduction high-dimensional data multiple-index model outcome regression semiparametric efficiency


Ma, Shujie; Zhu, Liping; Zhang, Zhiwei; Tsai, Chih-Ling; Carroll, Raymond J. A robust and efficient approach to causal inference based on sparse sufficient dimension reduction. Ann. Statist. 47 (2019), no. 3, 1505--1535. doi:10.1214/18-AOS1722.

Export citation


  • Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica 74 235–267.
  • Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962–972.
  • Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
  • Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81 608–650.
  • Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bura, E. and Cook, R. D. (2001). Extending sliced inverse regression: The weighted chi-squared test. J. Amer. Statist. Assoc. 96 996–1003.
  • Cao, W., Tsiatis, A. A. and Davidian, M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96 723–734.
  • Chan, K. C. G. and Yam, S. C. P. (2014). Oracle, multiple robust and multipurpose calibration in a missing response problem. Statist. Sci. 29 380–396.
  • Charlton, K., Kowal, P., Soriano, M. M., Williams, S., Banks, E., Vo, K. and Byles, J. (2014). Fruit and vegetable intake and body mass index in a large sample of middle-aged Australian men and women. Nutrients 6 2305–2319.
  • Chen, L. and Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Amer. Statist. Assoc. 107 1533–1545.
  • Cook, R. D. and Lee, H. (1999). Dimension reduction in binary response regression. J. Amer. Statist. Assoc. 94 1187–1200.
  • Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30 455–474.
  • Duan, N. and Li, K.-C. (1991). Slicing regression: A link-free regression method. Ann. Statist. 19 505–530.
  • Farrell, M. H. (2015). Robust inference on average treatment effects with possibly more covariates than observations. J. Econometrics 189 1–23.
  • Feng, Z., Wen, X. M., Yu, Z. and Zhu, L. (2013). On partial sufficient dimension reduction with applications to partially linear multi-index models. J. Amer. Statist. Assoc. 108 237–246.
  • Freedman, D. A. and Berk, R. A. (2008). Weighting regressions by propensity scores. Eval. Rev. 32 392–409.
  • Ghosh, D. (2011). Propensity score modelling in observational studies using dimension reduction methods. Statist. Probab. Lett. 81 813–820.
  • Gong, P., Zhang, C., Lu, Z., Huang, J. Z. and Ye, J. (2013). A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In Proceedings of the 30th International Conference on Machine Learning (ICML) 28 37–45.
  • Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66 315–331.
  • Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statist. Sci. 1 297–318.
  • Heckman, J. J., Ichimura, H. and Todd, P. (1998). Matching as an econometric evaluation estimator. Rev. Econ. Stud. 65 261–294.
  • Heo, M., Kim, R. S., Wylie-Rosett, J., Allison, D. B., Heymsfield, S. B. and Faith, M. S. (2011). Inverse association between fruit and vegetable intake and BMI even after controlling for demographic, socioeconomic and lifestyle factors. Obesity Facts 4 449–455.
  • Hirano, K., Imbens, G. W. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71 1161–1189.
  • Imai, K. and Ratkovic, M. (2014). Covariate balancing propensity score. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 243–263.
  • Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
  • Kaufman, A., Auguston, E. M. and Patrick, H. (2012). Unraveling the relationship between smoking and weight: The role of sedentary behavior. J. Obesity. DOI:10.1155/2012/735465.
  • Li, K.-C. (1991). Sliced inverse regression for dimension reduction. J. Amer. Statist. Assoc. 86 316–342.
  • Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87 1025–1039.
  • Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist. 42 413–468.
  • Luo, W. and Li, B. (2016). Combining eigenvalues and variation of eigenvectors for order determination. Biometrika 103 875–887.
  • Luo, W., Zhu, Y. and Ghosh, D. (2017). On estimating regression-based causal effects using sufficient dimension reduction. Biometrika 104 51–65.
  • Ma, Y. and Zhu, L. (2012). A semiparametric approach to dimension reduction. J. Amer. Statist. Assoc. 107 168–179.
  • Ma, S., Zhu, L., Zhang, Z., Tsai, C.-L. and Carroll, R. J. (2018). Supplement to “A robust and efficient approach to causal inference based on sparse sufficient dimension reduction.” DOI:10.1214/18-AOS1722SUPP.
  • Mack, Y. P. and Silverman, B. W. (1982). Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Gebiete 61 405–415.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math. Model. 7 1393–1512.
  • Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc. 79 516–524.
  • Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Amer. Statist. 39 33–38.
  • Rotnitzky, A., Lei, Q., Sued, M. and Robins, J. M. (2012). Improved double-robust estimation in missing data and causal inference models. Biometrika 99 439–456.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688–701.
  • Schatzkin, A., Subar, A. F., Thompson, F. E., Harlan, L. C., Tangrea, J., Hollenbeck, A. R., Hurwitz, P. E., Coyle, L., Schussler, N., Michaud, D. S., Freedman, L. S., Brown, C. C., Midthune, D. and Kipnis, V. (2001). Design and serendipity in establishing a large cohort with wide dietary intake distributions: The national institutes of health-aarp diet and health study. Am. J. Epidemiol. 154 1119–1125.
  • Sekhon, J. S. (2008). Multivariate and propensity score matching software with automated balance optimization: The matching package for R. J. Stat. Softw. 42 1–52.
  • Snowden, J. M., Rose, S. and Mortimer, K. M. (2011). Implementation of G-computation on a simulated data set: Demonstration of a causal inference technique. Am. J. Epidemiol. 173 731–738.
  • Steffen, L. M., Jacobs, D. R., Murtaugh, M. A., Moran, A., Steinberger, J., Hong, C. P. and Sinaiko, A. R. (2003). Whole grain intake is associated with lower body mass and greater insulin sensitivity among adolescents. Am. J. Epidemiol. 158 243–250.
  • Tan, Z. (2006). A distributional approach for causal inference using propensity scores. J. Amer. Statist. Assoc. 101 1619–1637.
  • Tan, Z. (2010). Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika 97 661–682.
  • van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
  • van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2007). Super learner. Stat. Appl. Genet. Mol. Biol. 6 Art. 25, 23.
  • van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer Series in Statistics. Springer, New York.
  • van der Laan, M. J. and Rose, S. (2011). Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer, New York.
  • van der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. Int. J. Biostat. 2 Art. 11, 40.
  • Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
  • Xia, Y. (2008). A multiple-index model and dimension reduction. J. Amer. Statist. Assoc. 103 1631–1640.
  • Xia, Y., Tong, H., Li, W. K. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 363–410.
  • Yin, X., Li, B. and Cook, R. D. (2008). Successive direction extraction for estimating the central subspace in a multiple-index regression. J. Multivariate Anal. 99 1733–1757.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
  • Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
  • Zhou, S., van de Geer, S. and Bühlmann, P. (2009). Adaptive Lasso for high dimensional regression and Gaussian graphical modeling. Available at arxiv:0903.2515.

Supplemental materials

  • Supplement to “A robust and efficient approach to causal inference based on sparse sufficient dimension reduction”. The supplement contains the technical proof of Theorem 1, two lemmas that will be used in the proof of Theorem 2, and additional simulation studies.