The Annals of Statistics

Component selection and smoothing in multivariate nonparametric regression

Yi Lin and Hao Helen Zhang

Full-text: Open access


We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies.

Article information

Ann. Statist., Volume 34, Number 5 (2006), 2272-2297.

First available in Project Euclid: 23 January 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation 62J07: Ridge regression; shrinkage estimators
Secondary: 62G20: Asymptotic properties

Smoothing spline ANOVA method of regularization nonparametric regression nonparametric classification model selection machine learning


Lin, Yi; Zhang, Hao Helen. Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 (2006), no. 5, 2272--2297. doi:10.1214/009053606000000722.

Export citation


  • Breiman, L. (1995). Better subset selection using the nonnegative garrote. Technometrics 37 373--384.
  • Chen, S., Donoho, D. and Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33--61.
  • Chen, Z. (1993). Fitting multivariate regression functions by interaction spline models. J. Roy. Statist. Soc. Ser. B 55 473--491.
  • Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31 377--403.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407--499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348--1360.
  • Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109--148.
  • Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1--141.
  • Gu, C. (1992). Diagnostics for nonparametric regression models with additive terms. J. Amer. Statist. Assoc. 87 1051--1058.
  • Gu, C. (2002). Smoothing Spline ANOVA Models. Springer, Berlin.
  • Gunn, S. R. and Kandola, J. S. (2002). Structural modeling with sparse kernels. Machine Learning 48 137--163.
  • Shen, X., Huang, H. and Ye, J. (2004). Inference after model selection. J. Amer. Statist. Assoc. 99 751--762.
  • Shen, X. and Ye, J. (2002). Adaptive model selection. J. Amer. Statist. Assoc. 97 210--221.
  • Tapia, R. and Thompson, J. (1978). Nonparametric Probability Density Estimation. Johns Hopkins Univ. Press, Baltimore.
  • Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267--288.
  • Utreras, F. (1983). Natural spline functions: Their associated eigenvalue problem. Numer. Math. 42 107--117.
  • van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press.
  • van Gestel, T., Suykens, J. A. K., Baesens, B., Viaene, S., Vanthienen, J., Dedene, G., de Moor, B. and Vandewalle, J. (2004). Benchmarking least squares support vector machine classifiers. Machine Learning 54 5--32.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995). Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Ann. Statist. 23 1865--1895.
  • Yau, P., Kohn, R. and Wood, S. (2003). Bayesian variable selection and model averaging in high-dimensional multinomial nonparametric regression. J. Comput. Graph. Statist. 12 23--54.
  • Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120--131.
  • Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R. and Klein, B. (2004). Variable selection and model building via likelihood basis pursuit. J. Amer. Statist. Assoc. 99 659--672.