Electronic Journal of Statistics

Nonparametric inference via bootstrapping the debiased estimator

Gang Cheng and Yen-Chi Chen

Full-text: Open access

Abstract

In this paper, we propose to construct confidence bands by bootstrapping the debiased kernel density estimator (for density estimation) and the debiased local polynomial regression estimator (for regression analysis). The idea of using a debiased estimator was recently employed by Calonico et al. (2018b) to construct a confidence interval of the density function (and regression function) at a given point by explicitly estimating stochastic variations. We extend their ideas of using the debiased estimator and further propose a bootstrap approach for constructing simultaneous confidence bands. This modified method has an advantage that we can easily choose the smoothing bandwidth from conventional bandwidth selectors and the confidence band will be asymptotically valid. We prove the validity of the bootstrap confidence band and generalize it to density level sets and inverse regression problems. Simulation studies confirm the validity of the proposed confidence bands/sets. We apply our approach to an Astronomy dataset to show its applicability.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 2194-2256.

Dates
Received: June 2018
First available in Project Euclid: 28 June 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1561687408

Digital Object Identifier
doi:10.1214/19-EJS1575

Mathematical Reviews number (MathSciNet)
MR3980957

Zentralblatt MATH identifier
07080071

Subjects
Primary: 62G15: Tolerance and confidence regions
Secondary: 62G09, 62G07, 62G08

Keywords
Kernel density estimator local polynomial regression level set inverse regression confidence set bootstrap

Rights
Creative Commons Attribution 4.0 International License.

Citation

Cheng, Gang; Chen, Yen-Chi. Nonparametric inference via bootstrapping the debiased estimator. Electron. J. Statist. 13 (2019), no. 1, 2194--2256. doi:10.1214/19-EJS1575. https://projecteuclid.org/euclid.ejs/1561687408


Export citation

References

  • J. Abrevaya, Y.-C. Hsu, and R. P. Lieli. Estimating conditional average treatment effects., Journal of Business & Economic Statistics, 33(4):485–505, 2015.
  • J. K. Adelman-McCarthy, M. A. Agüeros, S. S. Allam, C. A. Prieto, K. S. Anderson, S. F. Anderson, J. Annis, N. A. Bahcall, C. Bailer-Jones, I. K. Baldry, et al. The sixth data release of the sloan digital sky survey., The Astrophysical Journal Supplement Series, 175(2):297, 2008.
  • R. Bahadur. A note on quantiles in large samples., The Annals of Mathematical Statistics, 37(3):577–580, 1966.
  • O. Bartalotti, G. Calhoun, and Y. He. Bootstrap confidence intervals for sharp regression discontinuity designs. In, Regression Discontinuity Designs: Theory and Applications, pages 421–453. Emerald Publishing Limited, 2017.
  • S. M. Berry, R. J. Carroll, and D. Ruppert. Bayesian smoothing and regression splines for measurement error problems., Journal of the American Statistical Association, 97(457):160–169, 2002.
  • M. Birke, N. Bissantz, and H. Holzmann. Confidence bands for inverse regression models., Inverse Problems, 26(11) :115020, 2010.
  • N. Bissantz and M. Birke. Asymptotic normality and confidence intervals for inverse regression models with convolution-type operators., Journal of Multivariate Analysis, 100(10) :2364–2375, 2009.
  • S. Bjerve, K. A. Doksum, and B. S. Yandell. Uniform confidence bounds for regression based on a simple moving average., Scandinavian Journal of Statistics, pages 159–169, 1985.
  • M. R. Blanton, D. J. Schlegel, M. A. Strauss, J. Brinkmann, D. Finkbeiner, M. Fukugita, J. E. Gunn, D. W. Hogg, Ž. Ivezić, G. Knapp, et al. New York university value-added galaxy catalog: a galaxy catalog based on new public surveys., The Astronomical Journal, 129(6) :2562, 2005.
  • P. J. P. J. Brown., Measurement, regression, and calibration. Number 04; QA278. 2, B7. 1993.
  • J. L. O. Cabrera. locpol: Kernel local polynomial regression., URL http://mirrors.ustc.edu.cn/CRAN/web/packages/locpol/index.html, 2018.
  • B. Cadre. Kernel estimation of density level sets., Journal of multivariate analysis, 97(4):999 –1023, 2006.
  • S. Calonico, M. D. Cattaneo, and M. H. Farrell. On the effect of bias estimation on coverage accuracy in nonparametric inference., arXiv preprint arXiv:1508.02973, 2015.
  • S. Calonico, M. D. Cattaneo, and M. H. Farrell. Coverage error optimal confidence intervals., arXiv preprint arXiv:1808.01398, 2018a.
  • S. Calonico, M. D. Cattaneo, and M. H. Farrell. On the effect of bias estimation on coverage accuracy in nonparametric inference., Journal of the American Statistical Association, pages 1–13, 2018b.
  • G. Carlsson. Topology and data., Bulletin of the American Mathematical Society, 46(2):255–308, 2009.
  • L. Cavalier. Nonparametric estimation of regression level sets., Statistics A Journal of Theoretical and Applied Statistics, 29(2):131–160, 1997.
  • J. Chacón, T. Duong, and M. Wand. Asymptotics for general multivariate kernel density derivative estimators., Statistica Sinica, 2011.
  • F. Chazal, B. T. Fasy, F. Lecci, A. Rinaldo, and L. Wasserman. Stochastic convergence of persistence landscapes and silhouettes. In, Proceedings of the thirtieth annual symposium on Computational geometry, page 474. ACM, 2014.
  • S. X. Chen. Empirical likelihood confidence intervals for nonparametric density estimation., Biometrika, 83(2):329–341, 1996.
  • Y.-C. Chen. Generalized cluster trees and singular measures., arXiv preprint arXiv:1611.02762, 2016.
  • Y.-C. Chen, C. R. Genovese, and L. Wasserman. Asymptotic theory for density ridges., The Annals of Statistics, 43(5) :1896–1928, 2015a.
  • Y.-C. Chen, S. Ho, P. E. Freeman, C. R. Genovese, and L. Wasserman. Cosmic web reconstruction through density ridges: method and algorithm., Monthly Notices of the Royal Astronomical Society, 454(1) :1140–1156, 2015b.
  • Y.-C. Chen, S. Ho, J. Brinkmann, P. E. Freeman, C. R. Genovese, D. P. Schneider, and L. Wasserman. Cosmic web reconstruction through density ridges: catalogue., Monthly Notices of the Royal Astronomical Society, page st w1554, 2016a.
  • Y.-C. Chen, S. Ho, R. Mandelbaum, N. A. Bahcall, J. R. Brownstein, P. E. Freeman, C. R. Genovese, D. P. Schneider, and L. Wasserman. Detecting effects of filaments on galaxy properties in the sloan digital sky survey iii., Monthly Notices of the Royal Astronomical Society, page st w3127, 2016b.
  • Y.-C. Chen, C. R. Genovese, and L. Wasserman. Density level sets: Asymptotics, inference, and visualization., Journal of the American Statistical Association, pages 1–13, 2017.
  • V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors., The Annals of Statistics, 41(6) :2786–2819, 2013.
  • V. Chernozhukov, D. Chetverikov, and K. Kato. Anti-concentration and honest, adaptive confidence bands., The Annals of Statistics, 42(5) :1787–1818, 2014a.
  • V. Chernozhukov, D. Chetverikov, and K. Kato. Comparison and anti-concentration bounds for maxima of gaussian random vectors., Probability Theory and Related Fields, pages 1–24, 2014b.
  • V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximation of suprema of empirical processes., The Annals of Statistics, 42(4) :1564–1597, 2014c.
  • V. Chernozhukov, D. Chetverikov, and K. Kato. Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related gaussian couplings., Stochastic Processes and their Applications, 2016.
  • V. Chernozhukov, D. Chetverikov, K. Kato, et al. Central limit theorems and bootstrap in high dimensions., The Annals of Probability, 45(4) :2309–2352, 2017.
  • H. D. Chiang, Y.-C. Hsu, and Y. Sasaki. A unified robust bootstrap method for sharp/fuzzy mean/quantile regression discontinuity/kink designs., arXiv preprint arXiv:1702.04430, 2017.
  • N. B. Cowan and Ž. Ivezić. The environment of galaxies at low redshift., The Astrophysical Journal Letters, 674(1):L13, 2008.
  • A. Cuevas, W. González-Manteiga, and A. Rodríguez-Casal. Plug-in estimation of general level sets., Australian & New Zealand Journal of Statistics, 48(1):7–19, 2006.
  • T. Duong. Local significant differences from nonparametric two-sample tests., Journal of Nonparametric Statistics, 25(3):635–645, 2013.
  • T. Duong and M. L. Hazelton. Cross-validation bandwidth matrices for multivariate kernel density estimation., Scandinavian Journal of Statistics, 32(3):485–506, 2005.
  • T. Duong, I. Koch, and M. Wand. Highest density difference region estimation with application to flow cytometric data., Biometrical Journal, 51(3):504–521, 2009.
  • T. Duong et al. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in r., Journal of Statistical Software, 21(7):1–16, 2007.
  • H. Edelsbrunner and D. Morozov. Persistent homology: theory and practice. In, Proceedings of the European Congress of Mathematics, pages 31–50, 2012.
  • B. Efron. Bootstrap methods: Another look at the jackknife., Annals of Statistics, 7(1):1–26, 1979.
  • U. Einmahl and D. M. Mason. Uniform in bandwidth consistency of kernel-type function estimators., The Annals of Statistics, 33(3) :1380–1403, 2005.
  • D. J. Eisenstein, D. H. Weinberg, E. Agol, H. Aihara, C. A. Prieto, S. F. Anderson, J. A. Arns, É. Aubourg, S. Bailey, E. Balbinot, et al. Sdss-iii: Massive spectroscopic surveys of the distant universe, the milky way, and extra-solar planetary systems., The Astronomical Journal, 142(3):72, 2011.
  • R. L. Eubank and P. L. Speckman. Confidence bands in nonparametric regression., Journal of the American Statistical Association, 88(424) :1287–1301, 1993.
  • J. Fan. Local linear regression smoothers and their minimax efficiencies., The Annals of Statistics, pages 196–216, 1993.
  • J. Fan and I. Gijbels., Local polynomial modelling and its applications: monographs on statistics and applied probability 66, volume 66. CRC Press, 1996.
  • B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, and A. Singh. Confidence sets for persistence diagrams., The Annals of Statistics, 42(6) :2301–2339, 2014.
  • D. A. Freedman. Bootstrapping regression models., The Annals of Statistics, 9(6) :1218–1228, 1981.
  • C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. On the path density of a gradient field., The Annals of Statistics, 37(6A) :3236–3271, 2009.
  • C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. Nonparametric ridge estimation., The Annals of Statistics, 42(4) :1511–1545, 2014.
  • E. Giné and A. Guillou. Rates of strong uniform consistency for multivariate kernel density estimators. In, Annales de l’Institut Henri Poincare (B) Probability and Statistics, volume 38, pages 907–921. Elsevier, 2002.
  • M.-A. Gruet. A nonparametric calibration analysis., The Annals of Statistics, 24(4) :1474–1492, 1996.
  • R. Grützbauch, C. J. Conselice, J. Varela, K. Bundy, M. C. Cooper, R. Skibba, and C. N. Willmer. How does galaxy environment matter? the relationship between galaxy environments, colour and stellar mass at $0.4<z<1$ in the palomar/deep2 survey., Monthly Notices of the Royal Astronomical Society, 411(2):929–946, 2011.
  • P. Hall. Large sample optimality of least squares cross-validation in density estimation., Annals of Statistics, 11(4) :1156–1174, 12 1983.
  • P. Hall. On bootstrap confidence intervals in nonparametric regression., The Annals of Statistics, 20(2):695–711, 1992a.
  • P. Hall. Effect of bias estimation on coverage accuracy of bootstrap confidence intervals for a probability density., The Annals of Statistics, 20(2):675–694, 1992b.
  • P. Hall and J. Horowitz. A simple bootstrap method for constructing nonparametric confidence bands for functions., The Annals of Statistics, 41(4) :1892–1921, 2013.
  • P. Hall and A. B. Owen. Empirical likelihood confidence bands in density estimation., Journal of Computational and Graphical Statistics, 2(3):273–289, 1993.
  • W. Härdle and A. W. Bowman. Bootstrapping in nonparametric regression: local adaptive smoothing and confidence bands., Journal of the American Statistical Association, 83(401):102–110, 1988.
  • W. Hardle and J. Marron. Bootstrap simultaneous error bars for nonparametric regression., The Annals of Statistics, 19(2):778–796, 1991.
  • W. Härdle, S. Huet, and E. Jolivet. Better bootstrap confidence intervals for regression curve estimation., Statistics: A Journal of Theoretical and Applied Statistics, 26(4):287–306, 1995.
  • W. Härdle, S. Huet, E. Mammen, and S. Sperlich. Bootstrap inference in semiparametric generalized additive models., Econometric Theory, 20(02):265–300, 2004.
  • D. W. Hogg, M. R. Blanton, D. J. Eisenstein, J. E. Gunn, D. J. Schlegel, I. Zehavi, N. A. Bahcall, J. Brinkmann, I. Csabai, D. P. Schneider, et al. The overdensities of galaxy environments as a function of luminosity and color., The Astrophysical Journal Letters, 585(1):L5, 2003.
  • Y.-C. Hsu. Consistent tests for conditional treatment effects. Technical report, Institute of Economics, Academia Sinica, Taipei, Taiwan, 2013.
  • A. Javanmard and A. Montanari. Confidence intervals and hypothesis testing for high-dimensional regression., Journal of Machine Learning Research, 15(1) :2869–2909, 2014.
  • K. Jisu, Y.-C. Chen, S. Balakrishnan, A. Rinaldo, and L. Wasserman. Statistical inference for cluster trees. In, Advances In Neural Information Processing Systems, pages 1831–1839, 2016.
  • E. Kong, O. Linton, and Y. Xia. Uniform Bahadur representation for local polynomial estimates of m-regression and its application to the additive model., Econometric Theory, 26(05) :1529–1564, 2010.
  • T. Laloe and R. Servien. Nonparametric estimation of regression level sets., Journal of the Korean Statistical Society, 2013.
  • I. Lavagnini and F. Magno. A statistical overview on univariate calibration, inverse regression, and detection limits: application to gas chromatography/mass spectrometry technique., Mass spectrometry reviews, 26(1):1–18, 2007.
  • S. Lee and Y.-J. Whang. Nonparametric tests of conditional treatment effects., 2009.
  • Q. Li and J. Racine. Cross-validated local linear nonparametric regression., Statistica Sinica, pages 485–512, 2004.
  • C. Loader. Locfit: local regression, likelihood and density estimation. r package version 1.5-9.1., Merck, Kenilworth, NJ: http://CRAN.R-project.org/package=locfit, 2013.
  • Y. Ma and X.-H. Zhou. Treatment selection in a randomized clinical trial via covariate-specific treatment effect curves., Statistical methods in medical research, page 0962280214541724, 2014.
  • E. Mammen and W. Polonik. Confidence regions for level sets., Journal of Multivariate Analysis, 122:202–214, 2013.
  • J. S. Marron and M. P. Wand. Exact mean integrated squared error., The Annals of Statistics, pages 712–736, 1992.
  • E. A. Nadaraya. On estimating regression., Theory of Probability & Its Applications, 9(1):141–142, 1964.
  • M. H. Neumann. Automatic bandwidth choice and confidence intervals in nonparametric regression., The Annals of Statistics, 23(6) :1937–1959, 1995.
  • M. H. Neumann and J. Polzehl. Simultaneous bootstrap confidence bands in nonparametric regression., Journal of Nonparametric Statistics, 9(4):307–333, 1998.
  • N. Padmanabhan, D. J. Schlegel, D. P. Finkbeiner, J. Barentine, M. R. Blanton, H. J. Brewington, J. E. Gunn, M. Harvanek, D. W. Hogg, Ž. Ivezić, et al. An improved photometric calibration of the sloan digital sky survey imaging data., The Astrophysical Journal, 674(2) :1217, 2008.
  • W. Polonik. Measuring mass concentrations and estimating density contour clusters-an excess mass approach., The Annals of Statistics, pages 855–881, 1995.
  • W. Qiao. Asymptotics and optimal bandwidth selection for nonparametric estimation of density level sets., arXiv preprint arXiv:1707.09697, 2017.
  • A. Rinaldo, A. Singh, R. Nugent, and L. Wasserman. Stability of density-based clustering., The Journal of Machine Learning Research, 13(1):905–948, 2012.
  • J. P. Romano. Bootstrapping the mode., Annals of the Institute of Statistical Mathematics, 40(3):565–586, 1988.
  • S. R. Sain, K. A. Baggerly, and D. W. Scott. Cross-validation of multivariate densities., Journal of the American Statistical Association, 89(427):807–817, 1994.
  • D. W. Scott., Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2015.
  • S. Sheather and C. Jones. A reliable data-based bandwidth selection method for kernel density estimation., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 53(3):683–690, 1991.
  • S. J. Sheather. Density estimation., Statistical Science, 19(4):588–597, 2004.
  • B. W. Silverman., Density estimation for statistics and data analysis. Chapman and Hall, 1986.
  • J. Sun, C. R. Loader, et al. Simultaneous confidence bands for linear regression and smoothing., The Annals of Statistics, 22(3) :1328–1345, 1994.
  • R. Tang, M. Banerjee, and G. Michailidis. A two-stage hybrid procedure for estimating an inverse regression function., The Annals of Statistics, 39(2):956–989, 2011.
  • C. Tortora, N. Napolitano, V. Cardone, M. Capaccioli, P. Jetzer, and R. Molinaro. Colour and stellar population gradients in galaxies: correlation with mass., Monthly Notices of the Royal Astronomical Society, 407(1):144–162, 2010.
  • A. B. Tsybakov. On nonparametric estimation of density level sets., The Annals of Statistics, 25(3):948–969, 1997.
  • S. Van de Geer, P. Bühlmann, Y. Ritov, and R. Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models., The Annals of Statistics, 42(3) :1166–1202, 2014.
  • A. van der Vaart and J. A. Wellner., Weak Convergence and Empirical Process. Springer, 1996.
  • L. Wasserman., All of nonparametric statistics. Springer-Verlag New York, Inc., 2006.
  • L. Wasserman. Topological data analysis., Annual Review of Statistics and Its Application, 5:501–532, 2018.
  • S. Weisberg., Applied linear regression, volume 528. John Wiley & Sons, 2005.
  • C. Wu. Jackknife, bootstrap and other resampling methods in regression analysis., The Annals of Statistics, 14(4) :1261–1295, 1986.
  • Y. Xia. Bias-corrected confidence bands in nonparametric regression., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(4):797–811, 1998.
  • Y. Xia and W. Li. Asymptotic behavior of bandwidth selected by the cross-validation method for local polynomial fitting., Journal of multivariate analysis, 83(2):265–287, 2002.
  • D. G. York, J. Adelman, J. E. Anderson Jr, S. F. Anderson, J. Annis, N. A. Bahcall, J. Bakken, R. Barkhouser, S. Bastian, E. Berman, et al. The sloan digital sky survey: Technical summary., The Astronomical Journal, 120(3) :1579, 2000.
  • C.-H. Zhang and S. S. Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1):217–242, 2014.