The Annals of Statistics

Nonparametric regression penalizing deviations from additivity

M. Studer, B. Seifert, and T. Gasser

Full-text: Open access


Due to the curse of dimensionality, estimation in a multidimensional nonparametric regression model is in general not feasible. Hence, additional restrictions are introduced, and the additive model takes a prominent place. The restrictions imposed can lead to serious bias. Here, a new estimator is proposed which allows penalizing the nonadditive part of a regression function. This offers a smooth choice between the full and the additive model. As a byproduct, this penalty leads to a regularization in sparse regions. If the additive model does not hold, a small penalty introduces an additional bias compared to the full model which is compensated by the reduced bias due to using smaller bandwidths.

For increasing penalties, this estimator converges to the additive smooth backfitting estimator of Mammen, Linton and Nielsen [Ann. Statist. 27 (1999) 1443–1490].

The structure of the estimator is investigated and two algorithms are provided. A proposal for selection of tuning parameters is made and the respective properties are studied. Finally, a finite sample evaluation is performed for simulated and ozone data.

Article information

Ann. Statist., Volume 33, Number 3 (2005), 1295-1329.

First available in Project Euclid: 1 July 2005

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62H99: None of the above, but in this section

Nonparametric estimation additive models model choice curse of dimensionality regularization parameter selection AIC


Studer, M.; Seifert, B.; Gasser, T. Nonparametric regression penalizing deviations from additivity. Ann. Statist. 33 (2005), no. 3, 1295--1329. doi:10.1214/009053604000001246.

Export citation


  • Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Ann. Statist. 21 196–216.
  • Fan, J., Gasser, T., Gijbels, I., Brockmann, M. and Engel, J. (1997). Local polynomial regression: Optimal kernels and asymptotic minimax efficiency. Ann. Inst. Statist. Math. 49 79–99.
  • Gao, F. (2003). Moderate deviations and large deviations for kernel density estimators. J. Theoret. Probab. 16 401–418.
  • Härdle, W., Hall, P. and Marron, J. S. (1988). How far are automatically chosen regression smoothing parameters from their optimum? (with discussion). J. Amer. Statist. Assoc. 83 86–101.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Hurvich, C., Simonoff, J. and Tsai, C. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 271–293.
  • Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann. Statist. 27 1443–1490.
  • Mammen, E., Marron, J. S., Turlach, B. and Wand, M. (2001). A general projection framework for constrained smoothing. Statist. Sci. 16 232–248.
  • Nielsen, J. and Linton, O. (1998). An optimization interpretation of integration and back-fitting estimators for separable nonparametric models. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 217–222.
  • Nielsen, J. and Sperlich, S. (2005). Smooth backfitting in practice. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 43–61.
  • Rao, C. R. and Kleffe, J. (1988). Estimation of Variance Components and Applications. North-Holland, Amsterdam.
  • Seifert, B. and Gasser, T. (1996). Finite-sample variance of local polynomials: Analysis and solutions. J. Amer. Statist. Assoc. 91 267–275.
  • Seifert, B. and Gasser, T. (2000). Data adaptive ridging in local polynomial regression. J. Comput. Graph. Statist. 9 338–360.
  • Silverman, B. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Statist. 6 177–184.
  • Stone, C. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8 1348–1360.
  • Stone, C. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053.
  • Stone, C. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689–705.
  • Stone, C. (1986). The dimensionality reduction principle for generalized additive models. Ann. Statist. 14 590–606.
  • Studer, M. (2002). Nonparametric regression penalizing deviations from additivity. Ph.D. dissertation 14696, Swiss Federal Institute of Technology, Zurich (ETHZ).