## Taiwanese Journal of Mathematics

### INTEGRATED CROSS-VALIDATION FOR THE RANDOM DESIGN NONPARAMETRIC REGRESSION

#### Abstract

For the random design nonparametric regression, cross-validation is a popular bandwidth selector. It is constructed by using the criterion of weighted" integrated square error. In practice, however, the weighting scheme by the design density in the criterion causes that its associated cross-validation function puts more emphasis in regions with more data, gives little attention to regions with few data, but has no consideration for regions without data. In such a case, the value of the cross-validated bandwidth depends on the distribution of the design points, but is independent of the location of the interval on which the regression function value is estimated. Hence, if there are sparse regions in the realization of the design, then the resulting cross-validated bandwidth is usually not large enough in magnitude such that its corresponding kernel regression function estimate has rough appearance in these sparse regions. To avoid this drawback to cross-validation, we suggest using the criterion of ''unweighted'' integrated square error to construct the bandwidth selector. Under the criterion, a bandwidth selector called integrated cross-validation is proposed, and the resulting bandwidth is shown to be asymptotically optimal. Empirical studies demonstrate that the kernel regression function estimate obtained by using our proposed bandwidth is better than that employing the ordinary cross-validated bandwidth, in both senses of having smoother appearance and yielding smaller sample unweighted integrated square error.

#### Article information

Source
Taiwanese J. Math., Volume 9, Number 1 (2005), 123-141.

Dates
First available in Project Euclid: 18 July 2017

https://projecteuclid.org/euclid.twjm/1500407750

Digital Object Identifier
doi:10.11650/twjm/1500407750

Mathematical Reviews number (MathSciNet)
MR2122908

Zentralblatt MATH identifier
1064.62045

Subjects
Primary: 62G05: Estimation
Secondary: 62G20: Asymptotic properties

#### Citation

Chang, Tzu-Kuei; Deng, Wen-Shuenn; Lin, Jung-Huei; Chu, C. K. INTEGRATED CROSS-VALIDATION FOR THE RANDOM DESIGN NONPARAMETRIC REGRESSION. Taiwanese J. Math. 9 (2005), no. 1, 123--141. doi:10.11650/twjm/1500407750. https://projecteuclid.org/euclid.twjm/1500407750

#### References

• G. M. Bayhan, and M. Bayhan, Forecasting using autocorrelated errors and multicollinear predictor variables. Computers and Industrial Engineering, 34 (1998), 413-421.
• C. K. Chu, and J. S. Marron, Comparison of two bandwidth selectors with dependent errors. Annals of Statistics, 19 (1991a), 1906-1918.
• C. K. Chu, and J. S. Marron, Choosing a kernel regression estimator. Statistical Science, 6 (1991b), 404-436.
• R. M. Clark, A calibration curve for radiocarbon data. Antiquity, 49 (1975), 251-266.
• W. S. Deng, C. K. Chu, and M. Y. Cheng, A study of local linear ridge estimators. Journal of Statistical Planning and Inference, 93 (2001), 225-238.
• V. A. Epanechnikov, Nonparametric estimation of a multivariate probability density. Theory of Probability and Its Applications, 14 (1969), 153-158.
• R. L. Eubank, Spline Smoothing and Nonparametric Regression. Marcel Dekker Inc., New York, (1988).
• J. Fan, Design-adaptive nonparametric regression. Journal of the American Statistical Association, 87 (1992), 998-1004.
• Fan, J. Local linear regression smoothers and their minimax efficiencies. Annals of Statistics, 21 (1993), 196-216.
• J. Fan, T. Gasser, I. Gijbels, M. Brookmann, and J. Engel, Local polynomial fitting: A standard for nonparametric regression. Discussion paper 9315. Institut de Statistique, Universite Catholique de Louvain, Belgium, (1993).
• J. Fan, and I. Gijbels, Variable bandwidth and local linear regression smoothers. Annals of Statistics, 20 (1992), 2008-2036.
• J. Fan, and I. Gijbels, Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society, Ser. $B$, 57 (1995), 371-394.
• J. Fan, and I. Gijbels, Local Polynomial Modeling and Its Application – Theory and Methodologies. New York: Chapman and Hall, (1996).
• T. Gasser, and J. Engel, The choice of weights in kernel regression estimation. Biometrika, 77 (1990), 377-381.
• W. Härdle, Applied Nonparametric Regression. Cambridge University Press, (1990).
• W. Härdle, Smoothing Techniques: With Implementation in $S$. Springer Series in Statistics, Springer-Verlag, Berlin, (1991).
• W. Härdle, P. Hall, and J. S. Marron, How far are automatically chosen regression smoothing parameters from their optimum? Journal of the American Statistical Association, 83 (1988), 86-101.
• W. Härdle, and J. S. Marron, Optimal bandwidth selection in nonparametric regression function estimation. Annals of Statistics, 13 (1985), 1465-1481.
• P. Hall, and B. A. Turlach, Interpolation methods for adapting to sparse design in nonparametric regression (with discussion). Journal of the American Statistical Association, 92 (1997), 466-476.
• R. C. Hwang, Asymptotic properties of locally weighted regression. Journal of Nonparametric Statistics, 5 (1995), 303-310.
• J. S. Marron, Automatic smoothing parameter selection: A survey. Empirical Economics, 13 (1988), 187-208.
• J. S. Marron, and M. P. Wand, Exact mean integrated squared error. Annals of Statistics, 20 (1992), 712-736.
• H. G. Müller, Nonparametric Regression Analysis of Longitudinal Data. Lecture Notes in Statistics, No. 46, Springer-Verlag, Berlin, (1988).
• J. Rice, Bandwidth choice for nonparametric regression. Annals of Statistics, 12 (1984), 1215-1230.
• D. Ruppert, S. J. Sheather, and P. Wand, An effective bandwidth selector for local least squares regression. Journal of the American Statistical Association, 90 (1995), 1257-1270.
• D. W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York, (1992).
• B. Seifert, and T. Gasser, Finite-sample analysis of local polynomials: Analysis and solutions. Journal of the American Statistical Association, 91 (1996), 267-275.
• R. Serfling, Approximation Theorems of Mathematical Statistics. Wiley, New York, (1980).
• R. Shibata, An optimal selection of regression variables. Biometrika, 68 (1981), 45-54.
• B. W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York, (1986).
• J. S. Simonoff, Smoothing Methods in Statistics. New York: Springer, (1996).
• A. Stuart, and J. K. Ord, Kendall's Advanced Theory of Statistics, 1. Oxford University Press, New York, (1987).
• M. P. Wand, and M. C. Jones, Kernel Smoothing. Chapman and Hall, London, (1995).
• P. Whittle, Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and Its Applications, 5 (1960), 302-305.
• J. S. Wu, and C. K. Chu, Double smoothing for kernel estimators in nonparametric regression. Journal of Nonparametric Statistics, 1 (1992), 375-386.