Statistical Science

Graphical Models for Inference Under Outcome-Dependent Sampling

Vanessa Didelez, Svend Kreiner, and Niels Keiding

Full-text: Open access


We consider situations where data have been collected such that the sampling depends on the outcome of interest and possibly further covariates, as for instance in case-control studies. Graphical models represent assumptions about the conditional independencies among the variables. By including a node for the sampling indicator, assumptions about sampling processes can be made explicit. We demonstrate how to read off such graphs whether consistent estimation of the association between exposure and outcome is possible. Moreover, we give sufficient graphical conditions for testing and estimating the causal effect of exposure on outcome. The practical use is illustrated with a number of examples.

Article information

Statist. Sci., Volume 25, Number 3 (2010), 368-387.

First available in Project Euclid: 4 January 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Causal inference collapsibility odds ratios selection bias


Didelez, Vanessa; Kreiner, Svend; Keiding, Niels. Graphical Models for Inference Under Outcome-Dependent Sampling. Statist. Sci. 25 (2010), no. 3, 368--387. doi:10.1214/10-STS340.

Export citation


  • Altham, P. M. E. (1970). The measurement of association of rows and columns for an r×s contingency table. J. Roy. Statist. Soc. Ser. B 32 63–73.
  • Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
  • Asmussen, S. and Edwards, D. (1983). Collapsibility and response variables in contingency tables. Biometrika 70 566–578.
  • Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull. 2 47–53.
  • Breslow, N. E. (1996). Statistics in epidemiology: The case-control study. J. Amer. Statist. Assoc. 91 14–28.
  • Bishop, Y. M., Fienberg, S. and Holland, P. (1975). Discrete Multivariate Analysis. MIT Press, Cambridge, MA.
  • Cooper, G. F. (1995). Causal discovery from data in the presence of selection bias. Preliminary Papers of the 5th International Workshop on Artificial Intelligence and Statistics.
  • Copas, J. B. and Li, H. G. (1997). Inference for non-random samples (with discussion). J. Roy. Statist. Soc. Ser. B 59 55–95.
  • Cox, D. R. and Wermuth, N. (1996). Multivariate Depencencies—Models, Analysis and Interpretation. Chapman and Hall, London.
  • Clayton, D. G. (2002). Models, parameters, and confounding in epidemiology. Invited Lecture, International Biometric Conference, Freiburg. Available at
  • Darroch, J. N., Lauritzen, S. L. and Speed, T. P. (1980). Markov fields and log linear models for contingency tables. Ann. Statist. 8 522–539.
  • Davis, J. A. (1984). Extending Rosenberg’s technique for standardizing percentage tables. Social Forces 62 679–708.
  • Davis, L. J. (1986). Whittemore’s notion of collapsibility in multidimensional contingency tables. Comm. Statist. Theory Methods 15 2541–2554.
  • Dawid, A. P. (1979). Conditional independence in statistical theory (with discussion). J. Roy. Statist. Soc. Ser. B 41 1–31.
  • Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. Int. Statist. Rev. 70 161–189.
  • Dawid, A. P. (2010). Beware of the DAG! J. Mach. Learn. 6 59–86.
  • Dawid, A. P. and Didelez, V. (2010). Identifying the consequences of dynamic treatment strategies. A decision-theoretic overview. Statist. Surveys. To appear.
  • Didelez, V., Dawid, A. P. and Geneletti, S. (2006). Direct and indirect effects of sequential treatments. In Proceedings 22nd Conference on Uncertainty in Artificial Intelligence (R. Dechter and T. S. Richardson, eds.) 138–146. AUAI Press, Arlington, TX.
  • Didelez, V. and Edwards, D. (2004). Collapsibility of graphical CG-regression models. Scand. J. Statist. 31 535–551.
  • Didelez, V. and Sheehan, N. (2007a). Mendelian randomisation as an instrumental variable approach to causal inference. Statist. Meth. Med. Res. 16 309–330.
  • Didelez, V. and Sheehan, N. (2007b). Mendelian randomisation: Why epidemiology needs a formal language for causality. In Causality and Probability in the Sciences (F. Russo and J. Williamson, eds.) 263–292. College Publications, London.
  • Ducharme, G. R. and Lepage, Y. (1986). Testing collapsibility in contingency tables. J. Roy. Statist. Soc. Ser. B 48 197–205.
  • Edwards, A. W. F. (1963). The measure of association in a 2×2 table. J. Roy. Statist. Soc. Ser. A 126 109–114.
  • Frydenberg, M. (1990). The chain graph Markov property. Scand. J. Statist. 17 333–353.
  • Geneletti, S. (2007). Identifying direct and indirect effects in a non-counterfactual framework. J. Roy. Statist. Soc. Ser. B 69 199–215.
  • Geneletti, S., Richardson, S. and Best, N. (2009). Adjusting for selection bias in retrospective case-control studies. Biostatistics 10 17–31.
  • Geng, Z. (1992). Collapsibility of relative risk in contingency tables with a response variable. J. Roy. Statist. Soc. Ser. B 54 585–593.
  • Greenland, S. (2003). Quantifying biases in causal models: Classical confounding vs. collider-stratification bias. Epidemiology 14 300–306.
  • Greenland, S., Pearl, J. and Robins, J. M. (1999a). Causal diagrams for epidemiologic research. Epidemiology 10 37–48.
  • Greenland, S., Pearl, J. and Robins, J. M. (1999b). Confounding and collapsibility in causal inference. Statist. Sci. 14 29–46.
  • Guo, J., Geng, Z. and Fung, W.-K. (2001). Consecutive collapsibility of odds ratios over an ordinal background variable. J. Multivariate Anal. 79 89–98.
  • Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica 47 153–161.
  • Hernán, M. A., Hernández-Díaz, S. and Robins, J. M. (2004). A structural approach to selection bias. Epidemiology 15 615–625.
  • Kim, S.-H. and Kim, S.-H. (2006). A note on collapsibility in DAG models of contingency tables. Scand. J. Statist. 33 575–590.
  • Kreiner, S. (1987). Analysis of multidimensional contingency tables by exact methods. Scand. J. Statist. 14 97–112.
  • Lauritzen, S. L. (1982). Lectures on Contingency Tables. Aalborg Univ. Press.
  • Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford.
  • Lauritzen, S. L. (2000). Causal inference from graphical models. In Complex Stochastic Systems (O. E. Barndorff-Nielsen, D. R. Cox and C. Klüppelberg, eds.) 63–107. Chapman and Hall/CRC Press, London.
  • Lauritzen, S. L., Dawid, A. P., Larsen, B. N. and Leimer, H. G. (1990). Independence properties of directed Markov fields. Networks 20 491–505.
  • Lauritzen, S. L. and Richardson, T. S. (2002). Chain graph models and their causal interpretations (with discussion). J. Roy. Statist. Soc. Ser. B 64 321–361.
  • Lauritzen, S. L. and Richardson, T. S. (2008). Discussion of McCullagh: Sampling bias and logistic models. J. Roy. Statist. Soc. Ser. B 70 671.
  • Mansson, R., Joffe, M. M., Sun, W. and Hennessy, S. (2007). On the estimation and use of propensity scores in case-control and case-cohort studies. Am. J. Epidemiol. 166 332–339.
  • McCullagh, P. (2008). Sampling bias and logistic models. J. Roy. Statist. Soc. Ser. B 70 643–677.
  • Newman, S. C. (2006). Causal analysis of case-control data. Epidemiologic Perspectives and Innovations 3 2.
  • Pearl, J. (1993). Graphical models, causality and interventions. Statist. Sci. 8 266–269.
  • Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82 669–710.
  • Pearl, J. (2000). Causality—Models, Reasoning and Inference. Cambridge Univ. Press.
  • Pearl, J. (2001). Direct and indirect effects. In Proceedings 17th Conference on Uncertainty in Artificial Intelligence (J. Breese and D. Koller, eds.) 411–420. Morgan Kaufmann, San Francisco, CA.
  • Pedersen, A. T., Lidegaard, O., Kreiner, S. and Ottesen, B. (1997). Hormone replacement therapy and risk of non-fatal stroke. The Lancet 350 1277–1283.
  • Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66 403–411.
  • Robins, J. (1986). A new approach to causal inference in mortality studies with sustained exposure periods—application to control for the healthy worker survivor effect. Math. Model. 7 1393–1512.
  • Robins, J. M. (2001). Data, design, and background knowledge in etiologic inference. Epidemiology 12 313–320.
  • Robins, J. M. (2003). Semantics of causal DAG models and the identification of direct and indirect effects. In Highly Structured Stochastic Systems (P. Green, N. Hjort and S. Richardson, eds.) 70–81. Oxford Univ. Press.
  • Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.
  • Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846–866.
  • Robinson, L. D. and Jewell, N. P. (1991). Some surprising results about covariate adjustment in logistic regression models. Int. Statist. Rev. 2 227–240.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688–701.
  • Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
  • Shapiro, S. H. (1982). Collapsing contingency tables: A geometric approach. Amer. Statist. 36 43–46.
  • Slama, R., Ducot, B., Carstensen, L., Lorente, C., de La Rochebrochard, E., Leridon, H., Keiding, N. and Bouyer, J. (2006). Feasibility of the current duration approach to study human fecundity. Epidemiology 17 440–449.
  • Spirtes, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction, and Search, 1st ed. MIT Press, Cambridge, MA.
  • van der Laan, M. J. (2008). Estimation based on case-control designs with known prevalence probability. Int. J. Biostat. 4 1–57.
  • Verma, T. and Pearl, J. (1988). Causal networks: Semantics and expressiveness. In Proceedings of the 4th Conference on Uncertainty and Artificial Intelligence (R. D. Shachter, T. S. Levitt, L. N. Kanal and J. F. Lemmer, eds.) 69–76. Elsevier, New York.
  • Weinberg, C. R., Baird, D. D. and Rowland, A. S. (1993). Pitfalls inherent in retrospective time-to-event studies: The example of time to pregnancy. Statist. Med. 12 867–879.
  • Wermuth, N. (1987). Parametric collapsibility and the lack of moderating effects in contingency tables with a dichotomous response variable. J. Roy. Statist. Soc. Ser. B 49 353–364.
  • Wermuth, N. and Lauritzen, S. (1990). On substantive research hypotheses, conditional independence graphs and graphical chain models (with discussion). J. Roy. Statist. Soc. Ser. B 52 21–72.
  • Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.
  • Whittemore, A. S. (1978). Collapsibility of multidimensional contingency tables. J. Roy. Statist. Soc. Ser. B 40 328–340.
  • Xie, X. and Geng, Z. (2009). Collapsibility of directed acyclic graphs. Scand. J. Statist. 36 185–208.