The Annals of Statistics

Classification in general finite dimensional spaces with the k-nearest neighbor rule

Sébastien Gadat, Thierry Klein, and Clément Marteau

Full-text: Open access

Abstract

Given an $n$-sample of random vectors $(X_{i},Y_{i})_{1\leq i\leq n}$ whose joint law is unknown, the long-standing problem of supervised classification aims to optimally predict the label $Y$ of a given new observation $X$. In this context, the $k$-nearest neighbor rule is a popular flexible and intuitive method in non-parametric situations. Even if this algorithm is commonly used in the machine learning and statistics communities, less is known about its prediction ability in general finite dimensional spaces, especially when the support of the density of the observations is $\mathbb{R}^{d}$. This paper is devoted to the study of the statistical properties of the $k$-nearest neighbor rule in various situations. In particular, attention is paid to the marginal law of $X$, as well as the smoothness and margin properties of the regression function $\eta(X)=\mathbb{E}[Y|X]$. We identify two necessary and sufficient conditions to obtain uniform consistency rates of classification and derive sharp estimates in the case of the $k$-nearest neighbor rule. Some numerical experiments are proposed at the end of the paper to help illustrate the discussion.

Article information

Source
Ann. Statist., Volume 44, Number 3 (2016), 982-1009.

Dates
Received: November 2014
Revised: September 2015
First available in Project Euclid: 11 April 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1460381684

Digital Object Identifier
doi:10.1214/15-AOS1395

Mathematical Reviews number (MathSciNet)
MR3485951

Zentralblatt MATH identifier
1338.62082

Subjects
Primary: 62G05: Estimation 62F15: Bayesian inference
Secondary: 62G20: Asymptotic properties

Keywords
Supervised classification rates $k$-nearest neighbor plug-in rules

Citation

Gadat, Sébastien; Klein, Thierry; Marteau, Clément. Classification in general finite dimensional spaces with the k -nearest neighbor rule. Ann. Statist. 44 (2016), no. 3, 982--1009. doi:10.1214/15-AOS1395. https://projecteuclid.org/euclid.aos/1460381684


Export citation

References

  • [1] Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Comput. 9 1545–1588.
  • [2] Audibert, J.-Y. and Tsybakov, A. B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608–633.
  • [3] Barndorff-Nielsen, O. E. and Cox, D. R. (1989). Asymptotic Techniques for Use in Statistics. Chapman & Hall, London.
  • [4] Botev, Z. I., Grotowski, J. F. and Kroese, D. P. (2010). Kernel density estimation via diffusion. Ann. Statist. 38 2916–2957.
  • [5] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323–375.
  • [6] Breiman, L. (2001). Random forests. Mach. Learn. 45 5–32.
  • [7] Cannings, T. (2013). Nearest neighbour classification in the tails of a distribution. Preprint.
  • [8] Cérou, F. and Guyader, A. (2006). Nearest neighbor classification in infinite dimension. ESAIM Probab. Stat. 10 340–355 (electronic).
  • [9] Chaudhuri, K. and Dasgupta, S. (2014). Rates of convergence for nearest neighbor classification. In Advances in Neural Information Processing Systems (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence and K. Q. Weinberger, eds.) 27 3437–3445. Curran Associates, Red Hook, NY.
  • [10] Devroye, L. (1981). On the almost everywhere convergence of nonparametric regression function estimates. Ann. Statist. 9 1310–1319.
  • [11] Devroye, L., Györfi, L., Krzyżak, A. and Lugosi, G. (1994). On the strong universal consistency of nearest neighbor regression function estimates. Ann. Statist. 22 1371–1385.
  • [12] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. Springer, New York.
  • [13] Fix, E. and Hodges, J. L. (1951). Discriminatory analysis, nonparametric discrimination, consistency properties. Randolph Field, Texas, Project 21-49-004, Report 4.
  • [14] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.
  • [15] Gadat, S., Klein, T. and Marteau, C. (2015). Supplement to “Classification in general finite dimensional spaces with the k-nearest neighbor rule.” DOI:10.1214/15-AOS1395SUPP.
  • [16] Goldenshluger, A. and Lepski, O. (2014). On Adaptive Minimax Density Estimation on $R^{d}$. Probab. Theory Related Fields 159 479–543.
  • [17] Győrfi, L. (1978). On the Rate of Convergence of Nearest Neighbor Rules. IEEE Trans. Inform. Theory 24 509–512.
  • [18] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • [19] Hall, P., Park, B. U. and Samworth, R. J. (2008). Choice of neighbor order in nearest-neighbor classification. Ann. Statist. 36 2135–2152.
  • [20] Hüsler, J., Liu, R. Y. and Singh, K. (2002). A formula for the tail probability of a multivariate normal distribution and its applications. J. Multivariate Anal. 82 422–430.
  • [21] Lecué, G. (2007). Simultaneous adaptation to the margin and to complexity in classification. Ann. Statist. 35 1698–1721.
  • [22] Li, W. V. and Shao, Q.-M. (2001). Gaussian processes: Inequalities, small ball probabilities and applications. In Stochastic Processes: Theory and Methods. Handbook of Statist. 19 533–597. North-Holland, Amsterdam.
  • [23] Lian, H. (2011). Convergence of functional $k$-nearest neighbor regression estimate with functional responses. Electron. J. Stat. 5 31–40.
  • [24] Loustau, S. and Marteau, C. (2015). Minimax fast rates for discriminant analysis with errors in variables. Bernoulli 21 176–208.
  • [25] Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829.
  • [26] Reynaud-Bouret, P., Rivoirard, V. and Tuleau-Malot, C. (2011). Adaptive density estimation: A curse of support? J. Statist. Plann. Inference 141 115–139.
  • [27] Rodríguez Casal, A. (2007). Set estimation under convexity type assumptions. Ann. Inst. Henri Poincaré Probab. Stat. 43 763–774.
  • [28] Samworth, R. J. (2012). Optimal weighted nearest neighbour classifiers. Ann. Statist. 40 2733–2763.
  • [29] Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans. Inform. Theory 51 128–142.
  • [30] Stone, C. J. (1977). Consistent nonparametric regression. Ann. Statist. 5 595–645.
  • [31] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.
  • [32] Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, New York.

Supplemental materials

  • Supplement to “Classification in general finite dimensional spaces with the k-nearest neighbor rule”. Supplement contains some technical results and the proofs of Theorem 4.1, 4.2 and 4.5(i).