The Annals of Statistics

Estimation of the density of regression errors

Sam Efromovich

Full-text: Open access

Abstract

Estimation of the density of regression errors is a fundamental issue in regression analysis and it is typically explored via a parametric approach. This article uses a nonparametric approach with the mean integrated squared error (MISE) criterion. It solves a long-standing problem, formulated two decades ago by Mark Pinsker, about estimation of a nonparametric error density in a nonparametric regression setting with the accuracy of an oracle that knows the underlying regression errors. The solution implies that, under a mild assumption on the differentiability of the design density and regression function, the MISE of a data-driven error density estimator attains minimax rates and sharp constants known for the case of directly observed regression errors. The result holds for error densities with finite and infinite supports. Some extensions of this result for more general heteroscedastic models with possibly dependent errors and predictors are also obtained; in the latter case the marginal error density is estimated. In all considered cases a blockwise-shrinking Efromovich–Pinsker density estimate, based on plugged-in residuals, is used. The obtained results imply a theoretical justification of a customary practice in applied regression analysis to consider residuals as proxies for underlying regression errors. Numerical and real examples are presented and discussed, and the S-PLUS software is available.

Article information

Source
Ann. Statist., Volume 33, Number 5 (2005), 2194-2227.

Dates
First available in Project Euclid: 25 November 2005

Permanent link to this document
https://projecteuclid.org/euclid.aos/1132936561

Digital Object Identifier
doi:10.1214/009053605000000435

Mathematical Reviews number (MathSciNet)
MR2211084

Zentralblatt MATH identifier
1086.62053

Subjects
Primary: 62G07: Density estimation
Secondary: 62G20: Asymptotic properties

Keywords
Asymptotic error depending on predictor heteroscedastic regression infinite and finite supports oracle software wastewater treatment

Citation

Efromovich, Sam. Estimation of the density of regression errors. Ann. Statist. 33 (2005), no. 5, 2194--2227. doi:10.1214/009053605000000435. https://projecteuclid.org/euclid.aos/1132936561


Export citation

References

  • Achieser, N. I. (1956). Theory of Approximation. Ungar, New York.
  • Akritas, M. G. and Van Keilegom, I. (2001). Non-parametric estimation of the residual distribution. Scand. J. Statist. 28 549--567.
  • Bickel, P. J. and Ritov, Y. (2003). Nonparametric estimators that can be ``plugged-in.'' Ann. Statist. 31 1033--1053.
  • Brown, L. D. and Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24 2384--2398.
  • Brown, L. D., Low, M. G. and Zhao, L. H. (1997). Superefficiency in nonparametric function estimation. Ann. Statist. 25 2607--2625.
  • Carroll, R. J. and Ruppert, D. (1988). Transformation and Weighting in Regression. Chapman and Hall, New York.
  • Carroll, R. J., Ruppert, D. and Stefanskii, L. A. (1995). Measurement Error in Nonlinear Models. Chapman and Hall, New York.
  • Cheng, F. (2004). Weak and strong uniform consistency of a kernel error density estimator in nonparametric regression. J. Statist. Plann. Inference 119 95--107.
  • Donoho, D. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200--1224.
  • Duncan, A. J. (1986). Quality Control and Industrial Statistics, 5th ed. Irwin, Homewood, IL.
  • Efromovich, S. (1985). Nonparametric estimation of a density with unknown smoothness. Theory Probab. Appl. 30 557--568.
  • Efromovich, S. (1996). On nonparametric regression for iid observations in a general setting. Ann. Statist. 24 1126--1144.
  • Efromovich, S. (1997). Density estimation for the case of supersmooth measurement error. J. Amer. Statist. Assoc. 92 526--535.
  • Efromovich, S. (1999). Nonparametric Curve Estimation: Methods, Theory and Applications. Springer, New York.
  • Efromovich, S. (2001). Density estimation under random censorship and order restrictions: From asymptotic to small samples. J. Amer. Statist. Assoc. 96 667--684.
  • Efromovich, S. (2002). Adaptive estimation of error density in heteroscedastic nonparametric regression. Technical report, Dept. Mathematics and Statistics, Univ. New Mexico.
  • Efromovich, S. (2003). Estimation of the marginal regression error. Technical report, Dept. Mathematics and Statistics, Univ. New Mexico.
  • Efromovich, S. (2004). Density estimation for biased data. Ann. Statist. 32 1137--1161.
  • Efromovich, S. (2004). Infinite-support-error-density estimation. Technical report, Dept. Mathematics and Statistics, Univ. New Mexico.
  • Efromovich, S. (2004). Adaptive estimation of and oracle inequalities for probability densities. Technical report, Dept. Mathematics and Statistics, Univ. New Mexico.
  • Eubank, R. L. (1999). Nonparametric Regression and Spline Smoothing, 2nd ed. Dekker, New York.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modeling and Its Applications. Chapman and Hall, New York.
  • Goldstein, L. and Messer, K. (1992). Optimal plug-in estimators for nonparametric functional estimation. Ann. Statist. 20 1306--1328.
  • Golubev, G. K. (1992). Nonparametric estimation of smooth probability densities in $L_2$. Problems Inform. Transmission 28 44--54.
  • Golubev, G. K. and Levit, B. Y. (1996). Asymptotically efficient estimation for analytic distributions. Math. Methods Statist. 5 357--368.
  • Hall, P. and Hart, J. D. (1990). Nonparametric regression with long-range dependence. Stochastic Process. Appl. 36 339--351.
  • Hanson, T. and Johnson, W. O. (2002). Modeling regression error with a mixture of Pólya trees. J. Amer. Statist. Assoc. 97 1020--1033.
  • Hart, J. D. (1997). Nonparametric Smoothing and Lack-of-Fit Tests. Springer, New York.
  • Mallat, S. (2000). A Wavelet Tour of Signal Processing, 2nd ed. Academic Press, London.
  • Marron, J. S. and Wand, M. P. (1992). Exact mean integrated squared error. Ann. Statist. 20 712--736.
  • Müller, U. U., Schick, A. and Wefelmeyer, W. (2004). Estimating linear functionals of the error distribution in nonparametric regression. J. Statist. Plann. Inference 119 75--93.
  • Neter, J., Kutner, M., Nachtsheim, C. and Wasserman, W. (1996). Applied Linear Statistical Models, 4th ed. McGraw--Hill, Boston.
  • Nikolskii, S. M. (1975). Approximation of Functions of Several Variables and Imbedding Theorems. Springer, Berlin.
  • Nussbaum, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist. 24 2399--2430.
  • Pinsker, M. S. (1980). Optimal filtering of a square integrable signal in Gaussian white noise. Problems Inform. Transmission 16 52--68.
  • Schipper, M. (1996). Optimal rates and constants in $L_2$-minimax estimation of probability density functions. Math. Methods Statist. 5 253--274.
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
  • Van Keilegom, I. and Veraverbeke, N. (2002). Density and hazard estimation in censored regression models. Bernoulli 8 607--625.
  • Zhang, C.-H. (2005). General empirical Bayes wavelet methods and exactly adaptive minimax estimation. Ann. Statist. 33 54--100.