Statistical Science

Interval Estimation for Messy Observational Data

Paul Gustafson and Sander Greenland

Full-text: Open access


We review some aspects of Bayesian and frequentist interval estimation, focusing first on their relative strengths and weaknesses when used in “clean” or “textbook” contexts. We then turn attention to observational-data situations which are “messy,” where modeling that acknowledges the limitations of study design and data collection leads to nonidentifiability. We argue, via a series of examples, that Bayesian interval estimation is an attractive way to proceed in this context even for frequentists, because it can be supplied with a diagnostic in the form of a calibration-sensitivity simulation analysis. We illustrate the basis for this approach in a series of theoretical considerations, simulations and an application to a study of silica exposure and lung cancer.

Article information

Statist. Sci., Volume 24, Number 3 (2009), 328-342.

First available in Project Euclid: 31 March 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian analysis bias confounding epidemiology hierarchical prior identifiability interval coverage observational studies


Gustafson, Paul; Greenland, Sander. Interval Estimation for Messy Observational Data. Statist. Sci. 24 (2009), no. 3, 328--342. doi:10.1214/09-STS305.

Export citation


  • Bayarri, M. J. and Berger, J. O. (2004). The interplay of Bayesian and frequentist analysis. Statist. Sci. 19 58–80.
  • Box, G. E. P. (1980). Sampling and Bayes inference in scientific modeling and robustness. J. Roy. Statist. Soc. Ser. A 143 383–430.
  • Brown, L. D. (2008). In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies. Ann. Appl. Statist. 2 113–152.
  • Cook, S., Gelman, A. and Rubin, D. B. (2006). Validation of software for Bayesian models using posterior quantiles. J. Comput. Graph. Statist. 15 675–692.
  • Chu, H., Wang, Z., Cole, S. R. and Greenland, S. (2006). Illustration of a graphical and a Bayesian approach to sensitivity analysis of misclassification. Annals of Epidemiology 16 834–841.
  • Dendukuri, N. and Joseph, L. (2001). Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics 57 158–167.
  • Eddy, D. M., Hasselblad, V. and Schachter, R. (1992). Meta-analysis by the Confidence Profile Method. Academic Press, New York.
  • Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Greenland, S. (1997). Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical models in epidemiologic analysis. Stat. Med. 16 515–526.
  • Greenland, S. (2001). Sensitivity analysis, Monte Carlo risk analysis, and Bayesian uncertainty assessment. Risk Analysis 21 579–583.
  • Greenland, S. (2003). The impact of prior distributions for uncontrolled confounding and response bias: A case study of the relation of wire codes and magnetic fields to childhood leukemia. J. Amer. Statist. Assoc. 98 47–54.
  • Greenland, S. (2005). Multiple-bias modeling for analysis of observational data (with discussion). J. Roy. Statist. Soc. Ser. A 168 267–308.
  • Greenland, S. (2009). Relaxation penalties and priors for plausible modeling of nonidentified bias sources. Statist. Sci. 23. To appear.
  • Greenland, S. and Lash, T. L. (2008). Bias analysis. In Modern Epidemiology, 3rd ed. (K. J. Rothman, S. Greenland and T. L. Lash, eds.) 345–380. Lippincott-Williams-Wilkins, Philadelphia.
  • Greenland, S., Sheppard, A. R., Kaune, W. T., Poole, C. and Kelsh, M. A. (2000). A pooled analysis of magnetic fields, wire codes, and childhood leukemia. Epidemiology 11 624–634.
  • Gustafson, P. (2003). Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Chapman and Hall, Boca Raton, FL.
  • Gustafson, P. (2005a). On model expansion, model contraction, identifiability, and prior information: Two illustrative scenarios involving mismeasured variables (with discussion). Statist. Sci. 20 111–140.
  • Gustafson, P. (2005b). The utility of prior information and stratification for parameter estimation with two screening tests but no gold-standard. Stat. Med. 24 1203–1217.
  • Gustafson, P. (2006). Sample size implications when biases are modelled rather than ignored. J. Roy. Statist. Soc. Ser. A 169 883–902.
  • Gustafson, P. and Greenland, S. (2006a). Curious phenomena in adjusting for exposure misclassification. Stat. Med. 25 87–103.
  • Gustafson, P. and Greenland, S. (2006b). The performance of random coefficient regression in accounting for residual confounding. Biometrics 62 760–768.
  • Gustafson, P., Le, N. and Saskin R. (2001). Case-control analysis with partial knowledge of exposure misclassification probabilities. Biometrics 57 598–609.
  • Hanson, T. E., Johnson, W. O. and Gardner, I. A. (2003). Hierarchical models for the estimation of disease prevalence and the sensitivity and specificity of dependent tests in the absence of a gold-standard. Journal of Agricultural, Biological, and Environmental Statistics 8 223–239.
  • Imbens, G. W. and Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica 72 1845–1857.
  • Joseph, L., Gyorkos, T. W. and Coupal, L. (1995). Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. American Journal of Epidemiology 141 263–272.
  • Leamer, E. E. (1974). False models and post-data model construction. J. Amer. Statist. Assoc. 69 122–131.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis With Missing Data, 2nd ed. Wiley, New York.
  • McCandless, L. C., Gustafson, P. and Levy, A. R. (2007). Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26 2331–2347.
  • McCandless, L. C., Gustafson, P. and Levy, A. R. (2008). A sensitivity analysis using information about measured confounders yielded improved uncertainty assessments for unmeasured confounding. Journal of Clinical Epidemiology 61 247–255.
  • Meeden, G. and Vardeman, S. (1985). Bayes and admissible set estimation. J. Amer. Statist. Assoc. 80 465–471.
  • Molitor, J., Jackson, C., Best, N. B. and Richardson, S. (2009). Using Bayesian graphical models to model biases in observational studies and to combine multiple data sources: Application to low birthweight and water disinfection by-products. J. Roy. Statist. Soc. Ser. A 172 615–637.
  • Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications (with discussion). J. Amer. Statist. Assoc. 78 47–55.
  • Newton, M. A. and Kendziorski, C. M. (2003). Parametric empirical Bayes methods for microarrays. In The Analysis of Gene Expression Data Methods and Software (G. Parmigiani, E. S. Garrett, R. Irizarry and S. L. Zeger, eds.) 254–271. Springer, New York.
  • Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese 36 97–131.
  • Robert, C. P. (1994). The Bayesian Choice: A Decision-Theoretic Motivation. Springer, New York.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Rubin, D. B. and Schenker N. (1986). Efficiently simulating the coverage properties of interval estimates. Appl. Statist. 2 159–167.
  • Scharfstein, D. O., Daniels, M. and Robins, J. M. (2003). Incorporating prior beliefs into the analysis of randomized trials with missing outcomes. Biostatistics 4 495–512.
  • Steenland, K. and Greenland, S. (2004). Monte Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. American Journal of Epidemiology 160 384–392.
  • Tu, X., Litvak, E. and Pagano, M. (1994). Studies of AIDS and HIV surveillance screening tests: Can we get more by doing less? Statist. Med. 13 1905–1919.
  • Tu, X., Litvak, E. and Pagano, M. (1995). On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: Application in HIV screening. Biometrika 82 287–297.
  • Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S. and Thompson, S. G. (2009). Bias modeling in evidence synthesis. J. Roy. Statist. Soc. Ser. A 172 21–47.
  • Uno, H., Tian, L. and Wei, L. J. (2005). The optimal confidence region for a random parameter. Biometrika 92 957–964.
  • Vansteelandt, S., Goetghebeur, E., Kenward, M. G. and Molenberghs, G. (2006). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statist. Sinica 16 953–980.
  • Zhang, Z. (2009). Likelihood-based confidence sets for partially identified parameters. J. Statist. Plann. Inference 139 696–710.