## The Annals of Mathematical Statistics

- Ann. Math. Statist.
- Volume 29, Number 4 (1958), 995-1010.

### A High Dimensional Two Sample Significance Test

#### Abstract

The classical multivariate 2 sample significance test based on Hotelling's $T^2$ is undefined when the number $k$ of variables exceeds the number of within sample degrees of freedom available for estimation of variances and covariances. Addition of an a priori Euclidean metric to the affine $k$-space assumed by the classical method leads to an alternative approach to the same problem. A test statistic $F$ which is the ratio of 2 mean square distances is proposed and 3 methods of attaching a significance level to $F$ are described. The third method is considered in detail and leads to a "non-exact" significance test where the null hypothesis distribution of $F$ depends, in approximation, on a single unknown parameter $r$ for which an estimate must be substituted. Approximate distribution theory leads to 2 independent estimates of $r$ based on nearly sufficient statistics and these may be combined to yield a single estimate. A test of $F$ nominally at the 5% level but based on an estimate of $r$ rather than $r$ itself has a true significance level which is a function of $r$. This function is investigated and shown to be quite near 5%. The sensitivity of the test to a parameter measuring statistical distance between population means is discussed and it is shown that arbitrarily small differences in each individual variable can result in a detectable overall difference provided the number of variables (or, more precisely, $r$) can be made sufficiently large. This sensitivity discussion has stated implications for the a priori choice of metric in $k$-space. Finally a geometrical description of the case of large $r$ is presented.

#### Article information

**Source**

Ann. Math. Statist., Volume 29, Number 4 (1958), 995-1010.

**Dates**

First available in Project Euclid: 27 April 2007

**Permanent link to this document**

https://projecteuclid.org/euclid.aoms/1177706437

**Digital Object Identifier**

doi:10.1214/aoms/1177706437

**Mathematical Reviews number (MathSciNet)**

MR112207

**Zentralblatt MATH identifier**

0226.62014

**JSTOR**

links.jstor.org

#### Citation

Dempster, A. P. A High Dimensional Two Sample Significance Test. Ann. Math. Statist. 29 (1958), no. 4, 995--1010. doi:10.1214/aoms/1177706437. https://projecteuclid.org/euclid.aoms/1177706437