## The Annals of Statistics

### A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences

Norbert Henze

#### Abstract

For independent $d$-variate random samples $X_1, \cdots, X_{n_1}$ i.i.d. $f(x), Y_1, \cdots, Y_{n_2}$ i.i.d. $g(x)$, where the densities $f$ and $g$ are assumed to be continuous a.e., consider the number $T$ of all $k$ nearest neighbor comparisons in which observations and their neighbors belong to the same sample. We show that, if $f = g$ a.e., the limiting (normal) distribution of $T$, as $\min(n_1, n_2) \rightarrow \infty, n_1/(n_1 + n_2) \rightarrow \tau, 0 < \tau < 1$, does not depend on $f$. An omnibus procedure for testing the hypothesis $H_0: f = g$ a.e. is obtained by rejecting $H_0$ for large values of $T$. The result applies to a general distance (generated by a norm on $\mathbb{R}^d$) for determining nearest neighbors, and it generalizes to the multisample situation.

#### Article information

Source
Ann. Statist., Volume 16, Number 2 (1988), 772-783.

Dates
First available in Project Euclid: 12 April 2007

https://projecteuclid.org/euclid.aos/1176350835

Digital Object Identifier
doi:10.1214/aos/1176350835

Mathematical Reviews number (MathSciNet)
MR947577

Zentralblatt MATH identifier
0645.62062

JSTOR