Statistics Surveys

The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy

Nancy Heckman

Full-text: Open access


The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $\sum_{j}(Y_{j}-\mu(t_{j}))^{2}+\lambda \int_{a}^{b}[\mu''(t)]^{2}\,dt$, where the data are $t_{j},Y_{j}$, $j=1,\ldots,n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function $\mu$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $\int_{a}^{b}[\mu''(t)]^{2}\,dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to the construction and study of the Reproducing Kernel Hilbert Space corresponding to a penalty based on a linear differential operator. In this case, one can often calculate the minimizer explicitly, using Green’s functions.

Article information

Statist. Surv., Volume 6 (2012), 113-141.

First available in Project Euclid: 16 October 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G99: None of the above, but in this section 46E22: Hilbert spaces with reproducing kernels (= [proper] functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces) [See also 47B32]
Secondary: 62G08: Nonparametric regression

Penalized likelihood Reproducing Kernel Hilbert Space splines


Heckman, Nancy. The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy. Statist. Surv. 6 (2012), 113--141. doi:10.1214/12-SS101.

Export citation


  • [1] Andrews, D.F. and Herzberg, A.M. (1985). Data: A Collection of Problems from Many Fields for the Student and Research Worker. Springer-Verlag, New York.
  • [2] Anselone, P.M. and Laurent, P.J. (1967). A general method for the construction of interpolating or smoothing spline-functions. Numerische Mathematik 12, 66–82.
  • [3] Ansley, C., Kohn, R., and Wong, C. (1993). Nonparametric spline regression with prior information. Biometrika 80, 75–88.
  • [4] Aronszain, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society 68, 337–404.
  • [5] Bacchetti, P., Segal, M.R., Hessol, N.A., and Jewell, N.P. (1993). Different AIDS incubation periods and their impacts on reconstructing human immunodeficiency virsu epidemics and projecting AIDS incidence. Proceeding of the National Academy of Sciences, USA, 90, 2194-2196.
  • [6] Coddington, E.A. (1961). An Introduction to Ordinary Differential Equations. New Jersey: Prentice-Hall.
  • [7] Crambes, C., Kneip, A., and Sarda, P. (2009). Smoothing splines estimators for functional linear regression. Annals of Statistics 37, 35–72.
  • [8] Eubank, R.L. (1999). Spline Smoothing and Nonparametric Regression, Second Edition. New York: Marcel Dekker.
  • [9] Furrer, E.M. and Nychka, D. A framework to understand the asymptotic properties of Kriging and splines. URL:
  • [10] Green, P.J. and Silverman, B.W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman and Hall.
  • [11] Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning Data Mining, Inference, and Prediction, Second Edition. Springer Series in Statistics, Springer.
  • [12] Heckman, N. and Ramsay, J.O. (2000). Penalized regression with model based penalties. Canadian Journal of Statistics 28, 241–258.
  • [13] Hofmann, T., Schölkopf, B., and Smola, A. (2008). Kernel methods in machine learning. Annals of Statistics 36, 1171–1220.
  • [14] Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. Springer Series in Statistics, Volume 200.
  • [15] Kimeldorf, G. and Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications 33, 82–95.
  • [16] Kohn, R. and Ansley, C.F. (1988). Equivalence between Bayesian smoothness priors and optimal smoothing for function estimation. Bayesian Analysis of Time Series and Dynamic Models 1, 393–430.
  • [17] Kolmogorov, A.N. and Fomin, S.V. (1999). Elements of the Theory of Functions and Functional Analysis. Dover Publications.
  • [18] Kreyszig, E. (1989). Introductory Functional Analysis with Applications. Wiley.
  • [19] Li, Xiaochun. (1996). Local Linear Regression versus Backcalculation in Forecasting. Ph.D. thesis, Statistics Department, University of British Columbia.
  • [20] Nychka, D., Wahba, G., Goldfarb, S., and Pugh, T. (1984). Cross-validated spline methods for the estimation of three-dimensional tumor size distributions from observations on two-dimensional cross sections. Journal of the American Statistical Association 78, 832-846.
  • [21] Nychka, D. (2000). Spatial Process Estimates as Smoothers. Smoothing and Regression. Approaches, Computation and Application, ed. Schimek, M. G., Wiley, New York.
  • [22] Ramsay, J.O., Hooker, G., Campbell, D., and Cao, J. (2007). Parameter estimation for differential equations: a generalized smoothing approach. Journal of the Royal Statistical Society, Series B 69, 741-796.
  • [23] Ramsay, J.O. and Silverman, B.W. (2005). Functional Data Analysis, Second Edition. Springer.
  • [24] Rasmussen, C.E. and Williams, C.K.I. (2006). Gaussian Processes for Machine Learning. The MIT Press.
  • [25] Reinsch, C. (1967). Smoothing by spline functions. Numerische Mathematik 10, 177-183.
  • [26] Reinsch, C. (1970). Smoothing by spline functions II. Numerische Mathematik 16, 451-454.
  • [27] Thompson, J.R. and Tapia, R.A. (1990). Nonparametric Function Estimation, Modeling, and Simulation. Society for Industrial Mathematics.
  • [28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288.
  • [29] Wahba, G. (1999). Support vector machines, Reproducing Kernel Hilbert Spaces, and randomized GCV. Advances in Kernel Methods: Support Vector Learning. Bernhard Schölkopf, Christopher J. C. Burges and Alexander J. Smola, Editors. MIT Press, Cambridge, MA, 69–88.
  • [30] Wahba, G. (1990). Spline Models for Observational Data. Philadelpha: Society for Industrial and Applied Mathematics.
  • [31] Wahba, G. (2003). An introduction to Reproducing Kernel Hilbert Spaces and why they are so useful. Proceedings Volume from the 13th IFAC Symposium on System Identification, 27–29. IPV-IFAC Proceedings Volume. Paul M.J. Van Den Hof, Bo Wahlberg and Siep Weiland, Editors.
  • [32] Yuan, M. and Cai, T. (2010). A Reproducing Kernel Hilbert Space approach to functional linear regression. Annals of Statistics 38 3412–3444.