The Annals of Statistics

Effect of dependence on stochastic measures of accuracy of density estimations

Gerda Claeskens and Peter Hall

Full-text: Open access

Abstract

In kernel density estimation, those data values that make a nondegenerate contribution to the estimator (computed at a given point) tend to be spaced well apart. This property has the effect of suppressing many of the conventional consequences of long-range dependence, for example, slower rates of convergence, which might otherwise be revealed by a traditional loss-or risk-based assessment of performance. From that viewpoint, dependence has to be very long-range indeed before a density estimator experiences any first-order effects. However, an analysis in terms of the convergence rate for a particular realization, rather than the rate averaged over all realizations, reveals a very different picture. We show that from that viewpoint, and in the context of functions of Gaussian processes, effects on rates of convergence can become apparent as soon as the boundary between short- and long-range dependence is crossed. For example, the distance between ISE- and MISE-optimal bandwidths is generally of larger order for long-range dependent data. We shed new light on cross-validation, too. In particular we show that the variance of the cross-validation bandwidth is generally larger for long-range dependent data, and that the first-order properties of this bandwidth do not depend on how many data are left out when constructing the cross-validation criterion. Moreover, for long-range dependent data the cross-validation bandwidth is usually perfectly negatively correlated, in the limit, with the optimal stochastic bandwidth.

Article information

Source
Ann. Statist., Volume 30, Number 2 (2002), 431-454.

Dates
First available in Project Euclid: 14 May 2002

Permanent link to this document
https://projecteuclid.org/euclid.aos/1021379860

Digital Object Identifier
doi:10.1214/aos/1021379860

Mathematical Reviews number (MathSciNet)
MR1902894

Zentralblatt MATH identifier
1012.62031

Subjects
Primary: 62G07: Density estimation
Secondary: 62M10: Time series, auto-correlation, regression, etc. [See also 91B84]

Keywords
Bandwidth cross-validation Gaussian process integrated squared error kernel methods long-range dependence nonparametric density estimator risk-based analysis

Citation

Claeskens, Gerda; Hall, Peter. Effect of dependence on stochastic measures of accuracy of density estimations. Ann. Statist. 30 (2002), no. 2, 431--454. doi:10.1214/aos/1021379860. https://projecteuclid.org/euclid.aos/1021379860


Export citation

References

  • AHMAD, I. A. (1982). Integrated mean square properties of density estimation by orthogonal series methods for dependent variables. Ann. Inst. Statist. Math. 34 339-350.
  • BARNDORFF-NIELSEN, O. E. and COX, D. R. (1989). Asymptotic Techniques for Use in Statistics. Chapman and Hall, London.
  • CASTELLANA, J. V. (1989). Integrated consistency of smoothed probability density estimators for stationary sequences. Stochastic Process. Appl. 33 335-346.
  • CASTELLANA, J. V. and LEADBETTER, M. R. (1986). On smoothed probability density estimation for stationary processes. Stochastic Process. Appl. 21 179-193.
  • CHENG, B. and ROBINSON, P. M. (1991). Density estimation in strongly dependent nonlinear time series. Statist. Sinica 1 335-359.
  • CSÖRG O, S. and MIELNICZUK, J. (1995). Density estimation under long-range dependence. Ann. Statist. 23 990-999.
  • HALL, P. (1997). Defining and measuring long-range dependence. In Nonlinear Dynamics and Time Series: Building a Bridge between the Natural and Statistical Sciences (C. D. Cutler and D. T. Kaplan, eds.) 153-160. Amer. Math. Soc., Providence, RI.
  • HALL, P. and HART, J.D. (1990). Nonparametric regression with long-range dependence. Stochastic Process. Appl. 36 339-351.
  • HALL, P. and JOHNSTONE, I. (1992). Empirical functionals and efficient smoothing parameter selection (with discussion). J. Roy. Statist. Soc. Ser. B 54 475-531.
  • HALL, P., LAHIRI, S. N. and TRUONG, Y. K. (1995). On bandwidth choice for density estimation with dependent data. Ann. Statist. 23 2241-2263.
  • HALL, P. and MARRON, J. S. (1987). Extent to which least-squares cross-validation minimizes integrated square error in nonparametric density estimation. Probab. Theory Related Fields 74 567-581.
  • HALL, P. and MINNOTTE, M. C. (2000). High-order data sharpening for density estimation. Manuscript.
  • HÄRDLE, W., HALL, P. and MARRON, J. S. (1988). How far are automatically chosen regression smoothing parameters from their optimum? (with discussion). J. Amer. Statist. Assoc. 83 86-101.
  • HÄRDLE, W., LÜTKEPOHL, H. and CHEN, R. (1997). A review of nonparametric time series analysis. Internat. Statist. Rev. 65 49-72.
  • HART, J. D. (1996). Some automated methods of smoothing time-dependent data. J. Nonparametr. Statist. 6 115-142.
  • HART, J. D. and VIEU, P. (1990). Data-driven bandwidth choice for density estimation based on dependent data. Ann. Statist. 18 873-890.
  • KIM, T. Y. and COX, D. D. (1997). A study of bandwidth selection in density estimation under dependence. J. Multivariate Anal. 62 190-203.
  • MASRY, E. and FAN, J. (1997). Local polynomial estimation of regression functions for mixing processes. Scand. J. Statist. 24 165-179.
  • NGUYEN, H. T. (1979). Density estimation in a continuous-time stationary Markov process. Ann. Statist. 7 341-348.
  • PRAKASA RAO, B. L. S. (1978). Density estimation for Markov processes using delta sequences. Ann. Inst. Statist. Math. 30 321-328.
  • ROBINSON, P. M. (1991). Nonparametric function estimation for long memory time series. In Nonparametric and Semiparametric Methods in Econometrics and Statistics: Proceedings of the 5th International Symposium in Economic Theory and Econometrics (W. W. Barnett, J. Powell and G. E. Tauchen, eds.) 437-457. Cambridge Univ. Press.
  • ROSENBLATT, M. (1970). Density estimates and Markov sequences. In Nonparametric Techniques in Statistical Inference (M. L. Puri, ed.) 199-213. Cambridge Univ. Press.
  • ROUSSAS, G. (1969). Nonparametric estimation in Markov processes. Ann. Inst. Statist. Math. 21 73-87.
  • ROUSSAS, G. (1990). Asymptotic normality of the kernel estimate under dependence conditions: application to hazard rate. J. Statist. Plann. Inference 25 81-104.
  • ROUSSAS, G. and IOANNIDES, D. A. (1987). Note on the uniform convergence of density estimates for mixing random variables. Statist. Probab. Lett. 5 279-285.
  • TAQQU, M. S. (1975). Weak convergence to fractional Brownian motion and to the Rosenblatt process. Z. Wahrsch. Verw. Gebiete 31 287-302.
  • TRAN, L. T. (1989). The L1 convergence of kernel density estimates under dependence. Canad. J. Statist. 17 197-208.
  • TRAN, L. T. (1990). Kernel density estimation under dependence. Statist. Probab. Lett. 10 193-201.
  • COLLEGE STATION, TEXAS 77843-3143 CENTRE FOR MATHEMATICS AND ITS APPLICATIONS AUSTRALIAN NATIONAL UNIVERSITY
  • CANBERRA, ACT 0200 AUSTRALIA E-MAIL: halpstat@pretty.anu.edu.au