## Statistical Science

### On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression

#### Abstract

Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.

#### Article information

Source
Statist. Sci., Volume 32, Number 3 (2017), 455-468.

Dates
First available in Project Euclid: 1 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1504253126

Digital Object Identifier
doi:10.1214/17-STS613

Mathematical Reviews number (MathSciNet)
MR3696005

Zentralblatt MATH identifier
06870255

#### Citation

Dai, Wenlin; Tong, Tiejun; Zhu, Lixing. On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression. Statist. Sci. 32 (2017), no. 3, 455--468. doi:10.1214/17-STS613. https://projecteuclid.org/euclid.ss/1504253126

#### References

• Benko, M., Härdle, W. and Kneip, A. (2009). Common functional principal components. Ann. Statist. 37 1–34.
• Berkey, C. S. (1982). Bayesian approach for a nonlinear growth model. Biometrics 38 953–961.
• Bliznyuk, N., Carroll, R. J., Genton, M. G. and Wang, Y. (2012). Variogram estimation in the presence of trend. Stat. Interface 5 159–168.
• Brown, L. D. and Levine, M. (2007). Variance estimation in nonparametric regression via the difference sequence method. Ann. Statist. 35 2219–2232.
• Charnigo, R., Hall, B. and Srinivasan, C. (2011). A generalized $C_{p}$ criterion for derivative estimation. Technometrics 53 238–253.
• Cheng, M.-Y., Peng, L. and Wu, J.-S. (2007). Reducing variance in univariate smoothing. Ann. Statist. 35 522–542.
• Cook, J. R. and Stefanski, L. A. (1995). Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc. 89 1314–1328.
• Dai, W., Tong, T. and Genton, M. G. (2016). Optimal estimation of derivatives in nonparametric regression. J. Mach. Learn. Res. 17(164) 1–25.
• Dai, W., Tong, T. and Zhu, L. (2017). Supplement to “On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression.” DOI:10.1214/17-STS613SUPP.
• Dai, W., Ma, Y., Tong, T. and Zhu, L. (2015). Difference-based variance estimation in nonparametric regression with repeated measurement data. J. Statist. Plann. Inference 163 1–20.
• Dette, H. and Hetzler, B. (2009). A simple test for the parametric form of the variance function in nonparametric regression. Ann. Inst. Statist. Math. 61 861–886.
• Dette, H., Munk, A. and Wagner, T. (1998). Estimating the variance in nonparametric regression—what is a reasonable choice? J. Roy Statist. Soc. Ser. B. 60 751–764.
• De Brabanter, K., De Brabanter, J., De Moor, B. and Gijbels, I. (2013). Derivative estimation with local polynomial fitting. J. Mach. Learn. Res. 14 281–301.
• Einmahl, J. H. J. and Van Keilegom, I. (2008). Tests for independence in nonparametric regression. Statist. Sinica 18 601–615.
• Eubank, R. L. and Spiegelman, C. H. (1990). Testing the goodness of fit of a linear model via nonparametric regression techniques. J. Amer. Statist. Assoc. 85 387–392.
• Gasser, T., Kneip, A. and Köhler, W. (1991). A flexible and fast method for automatic smoothing. J. Amer. Statist. Assoc. 86 643–652.
• Gasser, T., Sroka, L. and Jennen-Steinmetz, C. (1986). Residual variance and residual pattern in nonlinear regression. Biometrika 73 625–633.
• Hall, P. and Heckman, N. E. (2000). Testing for monotonicity of a regression mean by calibrating for linear functions. Ann. Statist. 28 20–39.
• Hall, P., Kay, J. W. and Titterington, D. M. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77 521–528.
• Hall, P. and Keilegom, I. V. (2003). Using difference-based methods for inference in nonparametric regression with time series errors. J. Roy. Statist. Soc. Ser. B 65 443–456.
• Hall, P. and Marron, J. S. (1990). On variance estimation in nonparametric regression. Biometrika 77 415–419.
• Härdle, W. (1990). Applied Nonparametric Regression. Cambridge Univ. Press, Cambridge.
• Härdle, W. and Tsybakov, A. (1997). Local polynomial estimators of the volatility function in nonparametric autoregression. J. Econometrics 81 223–242.
• Müller, H.-G. and Stadtmüller, U. (1999). Discontinuous versus smooth regression. Ann. Statist. 27 299–337.
• Munk, A., Bissantz, N., Wagner, T. and Freitag, G. (2005). On difference-based variance estimation in nonparametric regression when the covariate is high dimensional. J. Roy. Statist. Soc. Ser. B 67 19–41.
• Paige, R. L., Sun, S. and Wang, K. (2009). Variance reduction in smoothing splines. Scand. J. Stat. 36 112–126.
• Park, C., Kim, I. and Lee, Y. (2012). Error variance estimation via least squares for small sample nonparametric regression. J. Statist. Plann. Inference 142 2369–2385.
• Pendakur, K. and Sperlich, S. (2010). Semiparametric estimation of consumer demand systems in real expenditure. J. Appl. Econometrics 25 420–457.
• Rice, J. A. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215–1230.
• Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J. Amer. Statist. Assoc. 92 1049–1062.
• Shen, H. and Brown, L. D. (2006). Non-parametric modelling of time-varying customer service times at a bank call centre. Appl. Stoch. Models Bus. Ind. 22 297–311.
• Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317–343.
• Stefanski, L. A. and Cook, J. R. (1995). Simulation-extrapolation: the measurement error jackknife. J. Amer. Statist. Assoc. 90 1247–1256.
• Tabakan, G. and Akdeniz, F. (2010). Difference-based ridge estimator of parameters in partial linear model. Statist. Papers 51 357–368.
• Tong, T., Ma, Y. and Wang, Y. (2013). Optimal variance estimation without estimating the mean function. Bernoulli 19 1839–1854.
• Tong, T. and Wang, Y. (2005). Estimating residual variance in nonparametric regression using least squares. Biometrika 92 821–830.
• Wahba, G. (1983). Bayesian “confidence intervals” for the cross-validated smoothing spline. J. Roy. Statist. Soc. Ser. B 45 133–150.
• Wang, Y. (2011). Smoothing Splines: Methods and Applications. Chapman & Hall, New York.
• Wang, L., Brown, L. D. and Cai, T. (2011). A difference based approach to the semiparametric partial linear model. Electron. J. Stat. 5 619–641.
• Wang, W. W. and Lin, L. (2015). Derivative estimation based on difference sequence via locally weighted least squares regression. J. Mach. Learn. Res. 16 2617–2641.
• Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131.
• Zhou, Y., Cheng, Y., Wang, L. and Tong, T. (2015). Optimal difference-based variance estimation in heteroscedastic nonparametric regression. Statist. Sinica 25 1377–1397.

#### Supplemental materials

• Supplement to “On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression.”.