The Annals of Statistics

Bandwidth selection: classical or plug-in?

Clive R. Loader

Full-text: Open access


Bandwidth selection for procedures such as kernel density estimation and local regression have been widely studied over the past decade. Substantial “evidence” has been collected to establish superior performance of modern plug-in methods in comparison to methods such as cross validation; this has ranged from detailed analysis of rates of convergence, to simulations, to superior performance on real datasets.

In this work we take a detailed look at some of this evidence, looking into the sources of differences. Our findings challenge the claimed superiority of plug-in methods on several fronts. First, plug-in methods are heavily dependent on arbitrary specification of pilot bandwidths and fail when this specification is wrong. Second, the often-quoted variability and undersmoothing of cross validation simply reflects the uncertainty of band-width selection; plug-in methods reflect this uncertainty by oversmoothing and missing important features when given difficult problems. Third, we look at asymptotic theory. Plug-in methods use available curvature information in an inefficient manner, resulting in inefficient estimates. Previous comparisons with classical approaches penalized the classical approaches for this inefficiency. Asymptotically, the plug-in based estimates are beaten by their own pilot estimates.

Article information

Ann. Statist., Volume 27, Number 2 (1999), 415-438.

First available in Project Euclid: 5 April 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation
Secondary: 62-07: Data analysis 62-09: Graphical methods 62G20: Asymptotic properties

Akaike’s information criterion bandwidth cross validation density estimation local fitting local likelihood plug-in


Loader, Clive R. Bandwidth selection: classical or plug-in?. Ann. Statist. 27 (1999), no. 2, 415--438. doi:10.1214/aos/1018031201.

Export citation


  • Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Appl. Statist. 39 357-365.
  • Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71 353-360.
  • Chiu, S. T. (1991). Bandwidth selection for kernel density estimation. Ann. Statist. 19 1883-1905.
  • Cleveland, W. S. (1993). Visualizing Data. Hobart Press, Summit, NJ.
  • Cleveland, W. S. and Devlin, S. J. (1988). Locally weighted regression: an approach to regression analysis by local fitting. J. Amer. Statist. Assoc. 83 596-610.
  • Cleveland, W. S. and Loader, C. R. (1996). Smoothing by local regression: principles and methods. In Statistical Theory and Computational Aspects of Smoothing (W. H¨ardle and M. G. Schimek, eds.) 10-49. Physica, Heidelberg.
  • Duin, R. P. W. (1976). On the choice of smoothing parameter for Parzen estimators of probability density functions. IEEE Trans. Comput. C-25 1175-1179.
  • Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Ann. Statist. 21 196-216.
  • Gasser, T., Kneip, A. and K ¨ohler, W. (1991). A flexible and fast method for automatic smoothing. J. Amer. Statist. Assoc. 86 643-652.
  • Habbema, J. D. F., Hermans, J. and VanDer Broek, K. (1974). A stepwise discriminant analysis program using density estimation. In COMPSTAT 1974, Proceedings in Computational Statistics, Vienna (G. Bruckman ed.) 101-110. Physica, Heidelberg.
  • Hall, P., Sheather, S. J., Jones, M. C. and Marron, J. S. (1991). On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78 263-270.
  • H¨ardle, W., Hall, P. and Marron, J. S. (1992). Regression smoothing parameters that are not far from their optimal. J. Amer. Statist. Assoc. 87 227-233.
  • Henderson, R. (1916). Note on graduation by adjusted average. Trans. Actuarial Soc. America 17 43-48.
  • Hjort, N. L. and Jones, M. C. (1996). Locally parametric nonparametric density estimation. Ann. Statist. 24 1619-1647.
  • Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. 91 401-407.
  • Lejeune, M. and Sarda, P. (1992). Smooth estimators of distribution and density functions. Comput. Statist. Data Anal. 14 457-471. Loader, C. R. (1996a). Local likelihood density estimation. Ann. Statist. 24 1602-1618. Loader, C. R. (1996b). Local Regression and Likelihood. Electronic book, stat/project/locfit/.
  • Mallows, C. L. (1973). Some comments on Cp. Technometrics 15 661-675.
  • Marron, J. S. (1996). A personal view of smoothing and statistics. In Statistical Theory and Computational Aspects of Smoothing (W. H¨ardle and M. G. Schimek eds.) 1-9. Physica, Heidelberg.
  • Marron, J. S. and Wand, M. P. (1992). Exact mean integrated squared error. Ann. Statist. 20 712-736.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman and Hall, London.
  • Park, B. U. and Marron, J. S. (1990). Comparison of data-driven bandwidth selectors. J. Amer. Statist. Assoc. 85 66-72.
  • Park, B. U. and Turlach, B. A. (1992). Practical performance of several data driven bandwidth selectors. Comput. Statist. 7 251-270.
  • Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215-1230.
  • Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27 832-837.
  • Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9 65-78.
  • Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257-1270.
  • Schuster, E. F. and Gregory, G. G. (1981). On the nonconsistency of maximum likelihood nonparametric density estimators. In Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface (W. F. Eddy, ed.) 295-298. Springer, Berlin.
  • Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York.
  • Scott, D. W. and Terrell, G. R. (1987). Biased and unbiased cross-validation in density estimation. J. Amer. Statist. Assoc. 82 1131-1146.
  • Sheather, S. J. (1992). The performance of six popular bandwidth selection methods on some real datasets. Comput. Statist. 7 225-250.
  • Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683-690.
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
  • Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8 1348-1360.
  • Taylor, C. C. (1989). Bootstrap choice of the smoothing parameter in kernel density estimation. Biometrika 76 705-712.
  • Tibshirani, R. J. and Hastie, T. J. (1987). Local likelihood estimation. J. Amer. Statist. Assoc. 82 559-567.
  • Woodroofe, M. (1970). On choosing a delta sequence. Ann. Math. Statist. 41 1665-1671.