The Annals of Statistics

Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis

Eric J. Tchetgen Tchetgen and Ilya Shpitser

Full-text: Open access


While estimation of the marginal (total) causal effect of a point exposure on an outcome is arguably the most common objective of experimental and observational studies in the health and social sciences, in recent years, investigators have also become increasingly interested in mediation analysis. Specifically, upon evaluating the total effect of the exposure, investigators routinely wish to make inferences about the direct or indirect pathways of the effect of the exposure, through a mediator variable or not, that occurs subsequently to the exposure and prior to the outcome. Although powerful semiparametric methodologies have been developed to analyze observational studies that produce double robust and highly efficient estimates of the marginal total causal effect, similar methods for mediation analysis are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about so-called marginal natural direct and indirect causal effects, while appropriately accounting for a large number of pre-exposure confounding factors for the exposure and the mediator variables. Our analytic framework is particularly appealing, because it gives new insights on issues of efficiency and robustness in the context of mediation analysis. In particular, we propose new multiply robust locally efficient estimators of the marginal natural indirect and direct causal effects, and develop a novel double robust sensitivity analysis framework for the assumption of ignorability of the mediator variable.

Article information

Ann. Statist., Volume 40, Number 3 (2012), 1816-1845.

First available in Project Euclid: 16 October 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation

Natural direct effects natural indirect effects double robust mediation analysis local efficiency


Tchetgen Tchetgen, Eric J.; Shpitser, Ilya. Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis. Ann. Statist. 40 (2012), no. 3, 1816--1845. doi:10.1214/12-AOS990.

Export citation


  • Avin, C., Shpitser, I. and Pearl, J. (2005). Identifiability of path-specific effects. In IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30–August 5, 2005 357–363.
  • Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962–972.
  • Baron, R. M. and Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51 1173–1182.
  • Cao, W., Tsiatis, A. A. and Davidian, M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96 723–734.
  • Goetgeluk, S., Vansteelandt, S. and Goetghebeur, E. (2008). Estimation of controlled direct effects. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 1049–1066.
  • Hafeman, D. (2008). Opening the black box: A reassessment of mediation from a counterfactual perspective. PhD dissertation, Columbia Univ., New York.
  • Hafeman, D. M. and VanderWeele, T. J. (2011). Alternative assumptions for the identification of direct and indirect effects. Epidemiology 22 753–764.
  • Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66 315–331.
  • Imai, K., Keele, L. and Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods 15 309–334.
  • Imai, K., Keele, L. and Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statist. Sci. 25 51–71.
  • Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
  • Pearl, J. (2001). Direct and indirect effects. In Proceedings of the 17th Annual Conference on Uncertainty in Artificial Intelligence (UAI-01) 411–442. Morgan Kaufmann, San Francisco, CA.
  • Pearl, J. (2011). The mediation formula: A guide to the assessment of causal pathways in nonlinear models. Technical report. Available at
  • Preacher, K. J., Rucker, D. D. and Hayes, A. F. (2007). Assessing moderated mediation hypotheses: Strategies, methods, and prescriptions. Multivariate Behavioral Research 42 185–227.
  • Robins, J. M. (2000). Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association Section on Bayesian Statistical Science 1999 6–10. Amer. Statist. Soc., Alexandria, VA.
  • Robins, J (2003). Semantics of causal DAG models and the identification of direct and indirect effects. In Highly Structured Stochastic Systems (P. Green, N. Hjort and S. Richardson, eds.) 70–81. Oxford Univ. Press, Oxford.
  • Robins, J. M. and Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology 3 143–155.
  • Robins, J. M., Mark, S. D. and Newey, W. K. (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48 479–495.
  • Robins, J. M. and Richardson, T. S. (2012). Alternative graphical causal models and the identification of direct effects. In Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures (P. Shrout, ed.). Oxford Univ. Press. To appear.
  • Robins, J. M. and Rotnitzky, A. (2001). Comment on “Inference for semiparametric models: Some questions and an answer by P. J. Bickel and J. Kwon.” Statist. Sinica 11 920–936.
  • Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846–866.
  • Robins, J. M., Rotnitzky, A. and Scharfstein, D. O. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, the Environment, and Clinical Trials (Minneapolis, MN, 1997). IMA Vol. Math. Appl. 116 1–94. Springer, New York.
  • Robins, J., Sued, M., Lei-Gomez, Q. and Rotnitzky, A. (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statist. Sci. 22 544–559.
  • Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096–1146.
  • Tan, Z. (2010). Bounded, efficient, and doubly robust estimation with inverse weighting. Biometrika 97 661–682.
  • Tchetgen Tchetgen, E. J. (2011). On causal mediation analysis with a survival outcome. Int. J. Biostat. 7 Art. 33, 38.
  • Tchetgen Tchetgen, E. J. and Lin, S. H. (2012). Robust estimation of pure/natural direct effects with mediator measurement error. Technical report, Dept. Epidemiology, Harvard School of Public Health.
  • Tchetgen Tchetgen, E. J. and Shpitser, I. (2011). Semiparametric estimation of models for natural direct and indirect effects. Harvard Univ. Biostatistics Working Paper 129. Available at
  • Tchetgen Tchetgen, E. J. and Shpitser, I. (2012). Supplement to “Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis.” DOI:10.1214/12-AOS990SUPP.
  • Tchetgen Tchetgen, E. J. and VanderWeele, T. J. (2012). On identification of natural direct effects when a confounder of the mediator is directly affected by exposure. Harvard Univ. Biostatistics Working Paper 148. Available at
  • Tsiatis, A. A. (2006). Semiparametric Theory and Missing Data. Springer, New York.
  • van der Laan, M. and Petersen, M. (2005). Direct effect models. Working Paper 187. Univ. California Berkeley Division of Biostatistics Working Paper Series. Available at
  • van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer, New York.
  • VanderWeele, T. J. (2009). Marginal structural models for the estimation of direct and indirect effects. Epidemiology 20 18–26.
  • VanderWeele, T. J. (2010). Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology 21 540–551.
  • Vanderweele, T. J. and Vansteelandt, S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. Am. J. Epidemiol. 172 1339–1348.

Supplemental materials

  • Supplementary material: Supplemental Appendix to Semiparametric theory for causal mediation analysis. The supplementary material gives the semiparametric efficiency theory for estimation of natural direct effects with a known model for the mediator density. The Appendix also gives the proof of Theorem 3 (stated in the Supplementary Appendix) and of Theorem 4.