Statistical Science

Bayesian backfitting (with comments and a rejoinder by the authors

Trevor Hastie and Robert Tibshirani

Full-text: Open access


We propose general procedures for posterior sampling from additive and generalized additive models. The procedure is a stochastic generalization of the well-known backfitting algorithm for fitting additive models. One chooses a linear operator (“smoother”) for each predictor, and the algorithm requires only the application of the operator and its square root. The procedure is general and modular, and we describe its application to nonparametric, semiparametric and mixed models.

Article information

Statist. Sci., Volume 15, Number 3 (2000), 196-223.

First available in Project Euclid: 24 December 2001

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Additive models back fitting Bayes Gibbs sampling random effects Metropolis–Hastings procedure


Hastie, Trevor; Tibshirani, Robert. Bayesian backfitting (with comments and a rejoinder by the authors. Statist. Sci. 15 (2000), no. 3, 196--223. doi:10.1214/ss/1009212815.

Export citation


  • Ansley, C. and Kohn, R. (1985). Estimation, filtering and smoothing in state space models with diffuse initial conditions. Ann. Statist. 13 1286-1316.
  • Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models (with discussion). Ann. Statist. 17 453- 555.
  • Carter, C. and Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika 81 541-553.
  • Chambers, J. and Hastie, T. (1991). Statistical Models in S. Wadsworth/Brooks Cole, Pacific Grove, CA.
  • de Boor, C. (1978). A Practical Guide to Splines. Springer, New York.
  • Denison, D., Mallick, B. and Smith, A. (1998). Automatic Bayesian curve fitting. J. Roy. Statist. Soc. Ser. B 60 333- 350.
  • Gelfand, A. E. and Smith, A. F. M. (1990). Sampling based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398-409.
  • Gelman, A., Carlin, J., Stern, H. and Rubin, D. (1995). Bayesian Data Analysis. CRC Press, Boca Raton, FL.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intelligence 6 721-741.
  • Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, London.
  • Hastie, T. (1995). Pseudosplines. J. Roy. Statist. Soc. Ser. B 58 379-396.
  • Hastie, T. and Stuetzle, W. (1989). Principle curves. J. Amer. Statist. Assoc. 84 502-516.
  • Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statist. Sci. 1 295-318.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97- 109.
  • Hobert, J. and Casella, G. (1996). The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. J. Amer. Statist. Assoc. 91 1461-1473.
  • Hodges, J. and Sargent, D. (1998). Counting degrees of freedom in hierarchical and other richly parametrized models. Technical report, Div., Biostatistics, Univ. Minnesota.
  • Holmes, C. and Mallick, B. (1997). Bayesian wavelet networks for nonparametric regression. IEEE. Trans. Neural Networks. To appear.
  • Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963-974.
  • Lin, X. and Zhang, D. (1997). Inference in generalized additive mixed models. Technical report, Biostatistics, Dept., Univ. Michigan.
  • Liu, J. S., Wong, W. H. and Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81 27-40.
  • Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer, New York.
  • O'Hagan, A. (1978). Curve fitting and optimal design for regression (with discussion). J. Roy. Statist. Soc. Ser. B 40 1-42.
  • Silverman, B. (1984). Spline smoothing: the equivalent kernel method. Ann. Statist. 12 898-9164.
  • Smith, M., Wong, C. and Kohn, R. (1998). Additive nonparametric regression with autocorrelated errors. J. Roy. Statist. Soc. Ser. B 60 311-332.
  • Speed, T. (1991). Comment on "That BLUP is a good thing: the estimation of random effects." Statist. Sci. 6 42-44. Spiegelhalter, D., Best, N., Gilks, W. and Inskip, H.
  • (1996). Hepatitis B: a case study in mcmc methods. In Markov Chain Monte Carlo in Practice (W. Gilks, S. Richardson and D. Spegelhalter, eds.) Chapman and Hall, London.
  • Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701-1762.
  • Wahba, G. (1980). Spline bases, regularization, and generalized cross-validation for solving approximation problems with large quantities of noisy data. In Proceedings of the International Conference on Approximation Theory in Honour of George Lorenz. Academic Press, Austin, TX.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • Williams, C. and Rasmussen, C. (1996). Gaussian processes for regression. In Neural Information Processing Systems 8 (D. S. Touretzky, M. C. Mozer and M. E. Hasselmo, eds.) MIT Press.
  • Wong, C. and Kohn, R. (1996). A Bayesian approach to estimating and forecasting additive nonparametric autoregressive models. J. Time Ser. Anal. 17 203-220.
  • Zeger, S. and Karim, M. (1991). Generalized linear models with random effects: a Gibbs sampling approach. J. Amer. Statist. Assoc. 86 79-86.
  • Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995). Bayesian computation and stochastic systems (with discussion). Statist. Sci. 10 3-66.
  • Bowman, A. and Young, S. (1996). Graphical comparison of nonparametric curves. Appl. Statist. 45 83-98.
  • Casella, G. and George, E. (1992). Explaining the Gibbs sampler. Amer. Statist. 46 167-174.
  • Cook, R. D. (1993). Exploring partial residual plots. Technometrics 35 351-362.
  • Cook, R. D. (1994). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In Proceedings of the Section on Physical and Engineering Sciences Amer. Statist. Assoc., 18-25. Alexandria, VA.
  • Cook, R. D. (1995). Graphics for studying net effects of regression predictors. Statist. Sinica 5 689-708.
  • Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York.
  • Cook, R. D. and Lee, H. (2000). Dimension reduction in binary response regression. J. Amer. Statist. Assoc. To appear.
  • Cook, R. D. and Weisberg, S. (1991). Discussion of "Sliced inverse regression for dimension reduction." J. Amer. Statist. Assoc. 86 316-342.
  • Cook, R. D. and Weisberg, S. (1997). Graphics for assessing the adequacy of regression models. J. Amer. Statist. Assoc. 92 490-499.
  • Cook, R. D. and Weisberg, S. (1999). Applied Regression Including Computing and Graphics. Wiley, New York.
  • Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998). Modelbased geostatistics (with discussion). J. Roy. Statist. Soc. Serv. C 47 299-350.
  • Gelfand, A. E. and Sahu, S. K. (1999). Identifiability, improper priors and Gibbs sampling for generalized linear models. J. Amer. Statist. Assoc. 94 247-253.
  • Gelfand, A. E., Sahu, S. K. and Carlin, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika 82 479-488.
  • Geman, S. and McClure, D. E. (1985). Bayesian image analysis: an application to single photon emission tomography. In Proceedings of the Statistical Computing Section 12-18. Amer. Statist. Assoc., Alexandria, VA.
  • Green, P. J. (1990). Bayesian reconstructions from emission tomography data using a modified EM algorithm. IEEE Trans. Medical Imaging 9 84-93.
  • Heikkinen, J. and Arjas, E. (1998). Nonparametric Bayesian estimation of a spatial Poisson intensity. Scand. J. Statist. 25 435-450.
  • Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316-342.
  • Lin, X. and Zhang, D. (1999). Mixed inference in generalized additive models. J. Roy. Statist. Soc. Ser. B 61 381-400.
  • Lindley, D. V. (1971). The estimation of many parameters (with discussion). In Foundations of Statistical Inference. (V. P. Godambe and D. A. Sprott, eds.) 435-452. Holt, Rinehart and Winston, Toronto.
  • M ¨uller, P., Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures. Biometrika 83 67-79.
  • Porzio, G. C. and Weisberg, S. (1999). Tests for lack-of-fit of regression models. Technical report 634, School Statistics, Univ. Minnesota.
  • Roberts, G. O and Sahu, S. K. (1997). Updating schemes, correlation structure, blocking and parameterisation for the Gibbs sampler. J. Roy. Statist. Soc. Ser. B 59 291-317.
  • Roberts, G. O. and Tweedie, R. L. (1996). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83 95-110.
  • Wahba, G. (1983). Bayesian confidence intervals for the crossvalidated smoothing spline. J. Roy. Statist. Soc. Ser. B 45 133-150.