International Statistical Review

Smoothing Observational Data: A Philosophy and Implementation for the Health Sciences

Sander Greenland

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Standard statistical methods (such as regression analysis) presume the data are generated by an identifiable random process, and attempt to model that process in a parsimonious fashion. In contrast, observational data in the health sciences are generated by complex, nonidentified, and largely nonrandom mechanisms, and are analyzed to form inferences on latent structures. Despite this gap between the methods and reality, most observational data analysis comprises application of standard methods, followed by narrative discussion of the problems of entailed by doing so. Alternative approaches employ latent-structure models that include components for nonidentified mechanisms. Standard methods can still be useful, however, provided their modeling philosophy is modified to encourage preservation of structure, rather than achieving parsimonious description. With this modification they can be viewed as smoothing or filtering methods for separating noise from signal before the task of latent-structure modeling begins. I here give a detailed justification of this view, and a hierarchical-modeling implementation that can be carried out with popular software. Concepts are illustrated in the smoothing of a contingency table from an analysis of magnetic fields and childhood leukemia.

Article information

Internat. Statist. Rev., Volume 74, Number 1 (2006), 31-46.

First available in Project Euclid: 29 March 2006

Permanent link to this document

Zentralblatt MATH identifier

Bias Empirical Bayes Epidemiologic methods Hierarchical regression Penalized likelihood Sensitivity analysis Smoothing


Greenland, Sander. Smoothing Observational Data: A Philosophy and Implementation for the Health Sciences. Internat. Statist. Rev. 74 (2006), no. 1, 31--46.

Export citation


  • [1] Bedrick, E.J., Christensen, R. & Johnson, W. (1996). A new perspective on generalized linear models (1996). Journal of the American Statistical Association, 91, 1450-1460.
  • [2] Bedrick, E.J., Christensen, R. & Johnson, W. (1997). Bayesian binomial regression: Predicting survival at a trauma center. The American Statistician, 51, 211-218.
  • [3] Berk, R.A. (2004). Regression Analysis: A Constructive Critique. Thousand Oaks, CA: Sage Publications.
  • [4] Bishop, Y.M.M., Fienberg, S.E. & Holland, P.W. (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.
  • [5] Breslow, N.E. & Clayton, D. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9-25.
  • [6] Carlin, B. & Louis, T.A. (2000). Bayes and Empirical-Bayes Methods of Data Analysis, 2nd ed. New York: Chapman and Hall.
  • [7] Clogg, C.C., Rubin, D.B., Schenker, N., Schultz, B. & Weidman, L. (1991). Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. Journal of the American Statistical Association, 86, 68-78.
  • [8] Coghill, R.W., Steward, J. & Philips, A. (1996). Extra low frequency electric and magnetic fields in the bedplace of children diagnosed with leukemia: a case-control study. European Journal of Cancer Prevention, 5, 153-158.
  • [9] Cox, D.R. & Hinkley, D.V (1974). Theoretical Statistics. New York: Chapman and Hall.
  • [10] Cox, D.R. & Solomon, P.J. (2002). Components of Variance. New York: Chapman and Hall.
  • [11] Dockerty, J.D., Elwood, J.M., Skegg, D.C.G. & Herbison, G.P. (1998). Electromagnetic field exposures and childhood cancers in New Zealand. Cancer Causes and Control, 9, 299-309; Erratum (1999), 10, 641.
  • [12] Eddy, D.M., Hasselblad, V. & Schachter, R. (1992). Meta-Analysis by the Confidence Profile Method. New York: Academic Press.
  • [13] Efron, B. & Morris, C.N. (1975). Data analysis using Stein's estimator and its generalization. Journal of the American Statistical Association, 70, 311-319.
  • [14] Feychting, M. & Ahlbom, A. (1993). Magnetic fields and cancer in children residing near Swedish high-voltage power lines. American Journal of Epidemiology, 138, 467-481.
  • [15] Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. (2003). Bayesian Data Analysis, 2nd ed. New York: Chapman and Hall/CRC.
  • [16] Good, I.J. (1965). The Estimation of Probabilities. Cambridge, MA: MIT Press.
  • [17] Good, I.J. (1983). Good Thinking. Minneapolis: University of Minnesota Press.
  • [18] Graham, P. (2000). Bayesian inference for a generalized population attributable fraction. Statistics in Medicine, 19, 937-956.
  • [19] Green, L., Miller, A.B., Villeneuve, P.J., Agnew, D.A., Greenberg, M.L., Li, J. & Donnelly, K.E. (1999). A case-control study of childhood leukemia in Southern Ontario, Canada, and exposure to magnetic fields in residences. International Journal of Cancer, 82, 161-170.
  • [20] Greenland, S. (1990). Randomization, statistics, and causal inference. Epidemiology, 1, 421-429.
  • [21] Greenland, S. (1993a). Summarization, smoothing, and inference. Scandinavian Journal of Social Medicine, 21, 227-232.
  • [22] Greenland, S. (1993b). Methods for epidemiologic analyses of multiple exposures: A review and comparative study of maximum-likelihood, preliminary testing, and empirical-Bayes regression. Statistics in Medicine, 12, 717-736.
  • [23] Greenland, S. (1997). Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical models in epidemiologic analysis. Statistics in Medicine, 16, 515-526.
  • [24] Greenland, S. (1998). The sensitivity of a sensitivity analysis (invited paper). In 1997 Proceedings of the Biometrics Section, Alexandria, VA: American Statistical Association, pp. 19-21.
  • [25] Greenland, S. (2000). When should epidemiologic regressions use random coefficients? Biometrics, 56, 915-921.
  • [26] Greenland, S. (2001). Putting background information about relative risks into conjugate priors. Biometrics, 57, 663-670.
  • [27] Greenland, S. (2002). A review of multilevel theory for ecologic analysis. Statistics in Medicine, 21, 389-395.
  • [28] Greenland, S. (2003a). The impact of prior distributions for uncontrolled confounding and response bias: A case study of the relation of wire codes and magnetic fields to childhood leukemia. Journal of the American Statistical Association, 98, 47-54.
  • [29] Greenland, S. (2003b). Generalized conjugate priors for Bayesian analysis of risk and survival regressions. Biometrics, 59, 92-99.
  • [30] Greenland S. (2005). Multiple-bias modeling for observational studies. Journal of the Royal Statistical Society, Series A, 168, 267-308.
  • [31] Greenland, S. & Christensen, R. (2001). Data augmentation for Bayesian and semi-Bayes analyses of conditional-logistic and proportional-hazards regression. Statistics in Medicine, 20, 2421-2428.
  • [32] Greenland, S., Sheppard, A.R., Kaune, W.T., Poole, C. & Kelsh, M.A. (2000a). A pooled analysis of magnetic fields, wire codes, and childhood leukemia. Epidemiology, 11, 624-663.
  • [33] Greenland, S., Schwartzbaum, J.A. & Finkle, W.D. (2000b). Problems from small samples and sparse data in conditional logistic regression analysis. American Journal of Epidemiology, 151, 531-539.
  • [34] Gustafson, P. (2003). Measurement Error and Misclassification in Statistics and Epidemiology. New York: Chapman and Hall.
  • [35] Hastie, T. & Tibshirani, R. (1990). Generalized Additive Models. New York: Chapman and Hall.
  • [36] Jones, M.C. (2004). Families of distributions arising from distributions of order statistics (with discussion). Test, 13, 1-44.
  • [37] Kabuto, M. (2003). A study on environmental EMF and children's health: final report of a grant-in-aid for scientific research project, 1999-2001 (in Japanese). Japanese Ministry of Education, Culture, Sports, Science and Technology.
  • [38] Kass, R. & Steffey, D. (1989). Approximate Bayesian inference in conditionally independent hierarchical models. Journal of the American Statistical Association, 84, 717-726.
  • [39] Landaw, E.M., Sampson, P.F. & Toporek, J.D. (1982). Advanced nonlinear regression in BMDP. In Proceedings of the Statistical Computing Section, pp. 228-233. Washington: American Statistical Association.
  • [40] Lash, T.L. & Fink, A.K. (2003). Semi-automated sensitivity analysis to assess systematic errors in observational epidemiologic data. Epidemiology, 14, 451-458.
  • [41] Leamer, E.E. (1978). Specification Searches. New York: Wiley.
  • [42] Leonard, T. & Hsu, J.S.J. (1999). Bayesian Methods. Cambridge, Cambridge University Press.
  • [43] Linet, M.S., Hatch, E.E., Kleinermann, R.A., Robison, L.C., Kaune, W.T., Friedman, D.R., Severson, R.K., Haines, C.M., Hartsock, C.T., Niwa, S., Wacholder, S. & Tarone, R.E. (1997). Residential exposure to magnetic fields and acute lymphoblastic leukemia in children. New England Journal of Medicine, 337, 1-7.
  • [44] Little, R.J.A. & Rubin, D.A. (2002). Statistical Analysis with Missing Data, 2nd ed. New York: Wiley.
  • [45] London, S.J., Thomas, D.C., Bowman, J.D., Sobel, E., Cheng, T.-C. & Peters, J.M. (1991). Exposure to residential electric and magnetic fields and risk of childhood leukemia. American Journal of Epidemiology, 134, 923-937.
  • [46] McBride, M.L., Gallagher, R.P., Theriault, H.G., Armstrong, B.G., Tamaro, S., Spinelli, J.J., Deadman, J.E., Fincham, S., Robson, D. & Choi, W. (1999). Power-frequency electric and magnetic fields and risk of childhood cancer. American Journal of Epidemiology, 149, 831-842.
  • [47] Michaelis, J., Schüz, J., Meinert, R., Semann, E., Grigat, J.P., Kaatsch, P., Kaletsch, U., Miesner, A., Brinkmann, K., Kalkner, W. & Karner, H. (1998). Combined risk estimates for two German population-based case-control studies on residential magnetic fields and childhood leukemia. Epidemiology, 9, 92-94.
  • [48] Olsen, J.H., Nielsen, A. & Schulgen, G. (1993). Residence near high voltage facilities and risk of cancer in children. British Medical Journal, 307, 891-895.
  • [49] Pearl, J. (2000). Causality. New York: Cambridge.
  • [50] Phillips, C.V. (2003). Quantifying and reporting uncertainty from systematic errors. Epidemiology, 14, 459-466.
  • [51] Robins, J.M., Rotnitzky, A. & Scharfstein, D.O. (1999). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, Eds. M.E. Halloran and D.A. Berry, pp. 1-92. New York: Springer-Verlag.
  • [52] Rosenbaum, P.R. (2002). Observational Studies, 2nd ed. New York: Springer.
  • [53] Rothman, K.J. & Greenland, S. (1998). Modern Epidemiology, 2nd ed. Philadelphia: Lippincott.
  • [54] Rubin, D.B. (1983). A case study of the robustness of Bayesian methods of inference. In Scientific Inference, Data Analysis, and Robustness., Eds. G.E.P. Box, T. Leonard and C.F. Wu, pp. 213-244. New York: Academic Press.
  • [55] Savitz, D.A., Wachtel, H., Barnes, F.A., John, E.M. & Tvrdik, J.G. (1988). Case-control study of childhood cancer and exposure to 60-Hz magnetic fields. American Journal of Epidemiology, 128, 21-38.
  • [56] Schüz, J., Grigat, J.P., Brinkmann, K. & Michaelis, J. (2001). Residential magnetic fields as a risk factor for acute childhood leukemia: Results from a German population-based case-control study. International Journal of Cancer, 91, 728-735.
  • [57] Sheppard, L. (2003). Insights on bias and information in group-level studies. Biostatistics, 4, 265-278.
  • [58] Titterington, D.M. (1985). Common structure of smoothing techniques in statistics. International Statistical Review, 53, 141-170.
  • [59] Tomenius, L. (1986). 50-Hz electromagnetic environment and the incidence of childhood tumors in Stockholm County. Bioelectromagnetics, 7, 191-207.
  • [60] Tynes, T. & Haldorsen, T. (1997). Electromagnetic fields and cancer in children residing near Norwegian high-voltage power lines. American Journal of Epidemiology, 145, 219-226.
  • [61] UK Childhood Cancer Study Investigators (1999). Exposure to power-frequency magnetic fields and the risk of childhood cancer. The Lancet, 354, 1925-1931.
  • [62] Verkasalo, P.K., Pukkala, E., Hongisto, M.Y., Valjus, J.E., Järvinen, P.J., Heikkilä, K.K. & Koskenvuo, M. (1993). Risk of cancer in Finnish children living close to power lines. British Medical Journal, 307, 895-899.
  • [63] Wakefield, J. (2004). Ecological inference for 2×2 tables. Journal of the Royal Statistical Society, Series A, 166, 385-445.
  • [64] White, H. (1993). Estimation, Inference, and Specification Analysis. New York: Cambridge University Press.