Electronic Journal of Statistics

Nonparametric confidence regions for level sets: Statistical properties and geometry

Wanli Qiao and Wolfgang Polonik

Full-text: Open access

Abstract

This paper studies and critically discusses the construction of nonparametric confidence regions for density level sets. Methodologies based on both vertical variation and horizontal variation are considered. The investigations provide theoretical insight into the behavior of these confidence regions via large sample theory. We also discuss the geometric relationships underlying the construction of horizontal and vertical methods, and how finite sample performance of these confidence regions is influenced by geometric or topological aspects. These discussions are supported by numerical studies.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 985-1030.

Dates
Received: July 2018
First available in Project Euclid: 30 March 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1553911237

Digital Object Identifier
doi:10.1214/19-EJS1543

Mathematical Reviews number (MathSciNet)
MR3934621

Zentralblatt MATH identifier
07056145

Subjects
Primary: 62G20: Asymptotic properties
Secondary: 62G05: Estimation

Keywords
Extreme value distribution level sets nonparametric surface estimation integral curves kernel density estimation

Rights
Creative Commons Attribution 4.0 International License.

Citation

Qiao, Wanli; Polonik, Wolfgang. Nonparametric confidence regions for level sets: Statistical properties and geometry. Electron. J. Statist. 13 (2019), no. 1, 985--1030. doi:10.1214/19-EJS1543. https://projecteuclid.org/euclid.ejs/1553911237


Export citation

References

  • Ambrosio, L., Colesanti, A. and Villa, E. (2008): Outer Minkowski content for some classes of closed sets., Math. Ann. 342, 727-748.
  • Arias-Castro, E., Mason, D., and Pelletier, B. (2016): On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Journal of Machine Learning Research, 17 1-28.
  • Audibert, J-Y. and Tsybakov, A. (2007): Fast Learning rates for plug-in classifier., Ann. Statist 35,608-633.
  • Biau, G., Cadre, B., and Pelletier, B. (2007): A graph-based estimator of the number of clusters., ESAIM Probab. Stat. 11 272-280.
  • Bickel, P. and Rosenblatt, M. (1973): On some global measures of the deviations of density function estimates., Ann. Statist., 1(6), 1071-1095.
  • Bobrowski, O., Mukherjee, S. and Taylor, J.E. (2017): Topological consistency via kernel estimation., Bernoulli, 23, 288-328.
  • Bredon, G.E. (1993): Topology and Geometry. Volume 139 of, Graduate Texts in Mathematics. Springer-Verlag, New York.
  • Breuning, M.M., Kriegel, H.P., Ng R.T., and Sander, J. (2000): Lof: identifying density-based local outlier., ACM sigmod record, 29, 93-104.
  • Broida, J.G. and Willamson, S.G. (1989):, A Comprehensive Introduction to Linear Algebra. Addison-Wesley
  • Cadre, B. (2006): Kernel estimation of density level sets., J. Multivariate Anal. 97 999-1023.
  • Calonico, S., Cattaneo, M.D. and Farrell, M.H. (2018a): On the effect of bias estimation on coverage accuracy in nonparametric inference., Journal of the American Statistical Association, DOI: 10.1080/01621459.2017.1285776.
  • Calonico, S., Cattaneo, M.D. and Farrell, M.H. (2018b): Coverage error optimal confidence intervals., arXiv:1808.01398
  • Cavalier, L (1997): Nonparametric estimation of regression level sets., Statistics. 29, 131-160.
  • Chazal, F., Lieutier, A. and Rossignac, J. (2007): Normal-map between normal-compatible manifolds., International Journal of Computational Geometry and and Applications, 17, 403-421.
  • Chen, Y, Genovese, C.R., Wasserman, L (2017): Density level set: asymptotics, inference, and visualization., J. Amer. Statist. Assoc., 112 1684-1696.
  • Chen, Y. (2017): Nonparametric Inference via Bootstrapping the Debiased Estimator., arXiv: 1702.07027
  • Chernozhukov, V., Chetverikov, D. and Kato, K. (2014): Gaussian approximation of suprema of empirical processes., Ann. Statist 42, 1564-1597.
  • Cuevas, A (2009): Set estimation: Another bridge between statistics and geometry., Boletín de Estadística e Investigación Operativa.
  • Cuevas, A., Febrero, M. and Fraiman, R. (2000): Estimating the number of clusters., Canad. J. Statist. 28, 367-382.
  • Cuevas, A., Fraiman, R., and Pateiro-López, B. (2012): On statistical properties of sets fulfilling rolling-type conditions., Advances in Applied Probability 44 311-329.
  • Cuevas, A., González-Manteiga, W., and Rodríguez-Casal, A. (2006): Plug-in estimation of general level sets., Australian & New Zealand Journal of Statistics 48 7-19.
  • Cuevas, A. and Rodríguez-Casal, A. (2004): On boundary estimation., Advances in Applied Probability, 340-354.
  • Einmahl, U., and Mason, D.M. (2005): Uniform in bandwidth consistency of kernel-type function estimators., Ann. Statist., 33, 1380-1403.
  • Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., and Singh, L. (2014): Confidence sets for persistence diagrams., Ann. Statist., 42, 2301-2339.
  • Federer, H. (1959): Curvature measures., Transactions of the American Mathematical Society, 93, 418-491.
  • Hall, P. (1979): The rate of convergence of normal extremes. J. Appl. Probab., 16, 433-439.
  • Hall, P. (1992):, The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York.
  • Hall, P. (1993): On Edgeworth expansion and bootstrap confidence bands in nonparametric curve estimation., Journal of the Royal Statistical Society, Series B. 55, 291-304.
  • Hall, P. and Jing, B.-Y. (1995): Uniform Coverage Error Bounds for Confidence Intervals and Berry-Esseen Theorems for Edgeworth Expansion., Annals of Statistics, 23, 363-375.
  • Hall, P. and Kang, K-H. (2005): Bandwidth choice for nonparametric classification., Ann. Statist. 33, 284-306.
  • Hartigan, J.A. (1987): Estimation of a convex density contour in two dimensions., J. Amer. Statist. Assoc., 82, 267-270.
  • Hodge, V.J., and Austin, J. (2004): A survey of outlier detection methodologies., Artificial Intelligence Review, 22(2), 85-126.
  • Jang, W. (2006): Nonparametric density estimation and clustering in astronomical sky survey., Comp. Statist. & Data Anal. 50, 760-774.
  • Jankowski, H and Stanberry, L. (2014): Visualizing variability: Confidence regions in level set estimation. Proceedings of the 16th International Conference on Geometry and Graphics, 1328-1339.
  • Mammen, E. and Polonik, W. (2013): Confidence sets for level sets., Journal of Multivariate Analysis, 122(C), 202-214.
  • Mammen, E. and Tsybakov, A.B. (1999): Smooth discrimination analysis., Ann. Statist. 27, 1808-1829.
  • Mason, D. and Polonik, W. (2009): Asymptotic normality of plug-in level set estimates, Annals of Applied Probability, 19(3), 1108-1142.
  • Neumann, M.H. (1998): Strong approximation of density estimators from weakly dependent observations by density estimators from independent observations., Ann. Statist. 26, 2014-2048.
  • Piterbarg, V.I. (1996):, Asymptotic Methods in the Theory of Gaussian Processes and Fields, Translations of Mathematical Monographs, Vol. 148, American Mathematical Society, Providence, RI.
  • Polonik, W. (1995): Measuring mass concentrations and estimating density contour clusters - an excess mass approach., Ann. Statist. 23, 855-881.
  • Qiao, W. (2018a): Asymptotics and optimal bandwidth selection for nonparametric estimation of density level sets., arXiv: 1707.09697.
  • Qiao, W. (2018b): Nonparametric estimation of surface integrals on density level sets., arXiv: 1804.03601.
  • Qiao, W. and Polonik, W. (2018): Extrema of rescaled locally stationary Gaussian fields on manifolds., Bernoulli 24, 1834-1859.
  • Qiao, W. and Polonik, W. (2016): Theoretical analysis of nonparametric filament estimation., Ann. Statist. 44, 1269-1297.
  • Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2010): Stability of density-based clustering., arXiv: 1011.2771v1.
  • Rosenblatt, M. (1976): On the maximal deviation of $k$-dimensional density estimates., Ann. Probab., 4(6), 1009-1015.
  • Samworth, R.J. and Wand, M.P. (2010): Asymptotics and optimal bandwidth selection for highest density region estimation., Ann. Statist. 38 1767-1792.
  • Sommerfeld, M., Sain, S., and Schwartzman, A. (2015): Confidence regions for excursion sets in asymptotically Gaussian random fields, with an application to climate., arXiv: 1501.07000.
  • Steinwart, I., Hush, D. and Scovel, C. (2005): A classification framework for anomaly detection., J. Machine Learning Reserach 6, 211-232.
  • Tsybakov, A.B. (1997): Nonparametric estimation of density level sets., Ann. Statist. 25, 948-969.
  • van der Vaart, A. and Wellner, J. (1996):, Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • Walther, G. (1997): Ganulometric smoothing., Ann. Statist. 25, 2273-2299.
  • Willett, R.M. and Nowak, R.D. (2005): Level set estimation in medical imaging, Proceedings of the IEEE Statistical Signal Processing, Vol. 5, 1089-1092.