## The Annals of Mathematical Statistics

### Fixed Alternatives and Wald's Formulation of the Noncentral Asymptotic Behavior of the Likelihood Ratio Statistic

T. W. F. Stroud

#### Abstract

Let $X$ be a random vector, taking values in $p$-dimensional Euclidean space $\mathscr{E}^p$ with density $f(x; \theta)$. The parameter $\theta$ belongs to a subset $\Theta$ of a Euclidean space $\mathscr{E}^q$ and is unkown. Let $g$ be a function over the parameter space having continuous first partial derivatives and taking values in $\mathscr{E}^r (r \leqq q)$. To test the hypothesis $g(\theta) = 0$ against the alternative $g(\theta) \neq 0$ using a sample of $n$ independent observations of $X$, one frequently uses the Neyman-Pearson generalized likelihood ratio test statistic $\lambda_n$. The limiting distribution of $-2\ln\lambda_n$ under the null hypothesis, as $n \rightarrow \infty$, was shown by Wilks (1938) to be chi-square with $r$ degrees of freedom (assuming regularity conditions). If $\{\theta_n\}$ is a sequence of alternatives converging to a point of the null hypothesis at the rate $n^{\frac{1}{2}}$, the limiting distribution is noncentral chi-square with noncentrality parameter equal to the limit of $n\lbrack g(\theta_n)\rbrack' \sum^{-1}_g (\theta_n)\lbrack g(\theta_n)\rbrack$, where $\sum_g(\theta)$ is the asymptotic covariance matrix of the quantity $n^{\frac{1}{2}}\lbrack g(\hat{\theta}) - g(\theta)\rbrack$ as $n \rightarrow \infty$ with $\theta$ fixed ($\hat{\theta}$ denoting the maximum-likelihood estimator of $\theta$ based on sample size $n$). This noncentral convergence was first proved by Wald (1943), along with a number of other results, on the basis of some rather severe uniformity conditions. Davidson and Lever (1970) have proved the result using more intuitive assumptions. Feder (1968) has obtained asymptotic noncentral chi-square for the case where both the hypothesis and alternative regions are cones in $\Theta$; this is essentially a generalization of $g(\theta) = 0$ versus $g(\theta) \neq 0$, since the hypothesis $g(\theta) = 0$ is locally equivalent to a hyperplane and $g(\theta) \neq 0$ to its complement. Despite the generality, Feder's assumptions are quite mild compared with Wald's. The result appears in Wald's paper as a special case of a more general statement entitled "Theorem IX." This theorem states that for $\theta \in \Theta$ and $-\infty < t < \infty$ the relationship \begin{equation*}\tag{1.1}P_\theta\lbrack -2 \ln \lambda_n < t\rbrack - P_\theta\lbrack K_n < t\rbrack \rightarrow 0\end{equation*} holds uniformly in $t$ and $\theta$, where $K_n$ has a noncentral chi-square distribution with $r$ degrees of freedom and noncentrality parameter equal to $n\lbrack g(\theta)\rbrack' \sum^{-1}_g (\theta)\lbrack g(\theta)\rbrack$. This formulation of Wald is too strong. It will be shown by counterexample that, if $\theta$ is held fixed while $n \rightarrow \infty$, relationship (1.1) fails to hold uniformly in $t$. The counterexample is that of testing the value of the mean of a normal distribution with unknown mean and variance. Wald's proof of Theorem IX treats two cases separately, case (i) where $\theta_n$ approaches the null hypothesis set at the rate $n^{-\frac{1}{2}}$ or faster, and case (ii) where it does not. The proof of (1.1) in case (i) requires convergence of $\theta_n$ at the rate $n^{-\frac{1}{2}}$ in order that the Taylor series expansion of the logarithm behave nicely. In case (ii) there is no reason at all to believe the distribution of $K_n$ to be a good approximation to that of $-2\ln\lambda_n$. From Wald's paper (page 480, line following (212)) one gets the impression that Wald felt that the statement of uniform convergence of (1.1) in case (ii) was trivial, since pointwise convergence is trivial (because both terms tend to zero for fixed $t$). But, since $K_n$ does not converge in distribution to a random variable in case (ii), there is really no reason why pointwise convergence should imply uniform convergence. In the same paper, Wald (1943) also described a test procedure based only on the unrestricted maximum-likelihood estimator $\hat{\theta}_n$. This procedure rejects for large values of the statistic $Q_n = n\lbrack g(\hat{\theta}_n)\rbrack' \sum^{-1}_g (\hat{\theta}_n)\lbrack g(\hat{\theta}_n)\rbrack.$ Wald claimed in his paper that (1.1) again holds uniformly in $t$ and $\theta$ if $-2\ln\lambda_n$ is replaced by $Q_n$. This claim too is false, in the stated generality, as the same counterexample will demonstrate. Keeping $\theta$ as a fixed alternative while $n \rightarrow \infty$ has the disadvantage that the limiting behavior of each of the quantities $-2\ln\lambda_n, Q_n$ and $K_n$ is degenerate in the sense that the probability mass moves out to infinity with increasing $n$. However, statement (1.1), uniform in $t$ for fixed $\theta$, has meaning here since both $-2\ln\lambda_n$ (or $Q_n$) and $K_n$ may be related to quantities with genuine limiting normal distributions which must be identical or at least very similar in order for (1.1) to be uniform in $t$. The precise result is embodied in a theorem presented in Section 2 of this paper. In Sections 3 and 4 we consider the case of $X$ normally distributed with mean $\mu$ and variance $\sigma^2$, where $-\infty < \mu < \infty, 0 < \sigma_1 < \sigma < \sigma_2$, and the hypothesis to be tested is $\mu = 0$. It is shown in Sections 3 and 4, respectively, that for this problem the relationships $P_\theta\lbrack Q_n < t\rbrack - P_\theta\lbrack K_n < t\rbrack \rightarrow 0$ and $P_\theta\lbrack -2\ln\lambda_n < t\rbrack - P_\theta\lbrack K_n < t\rbrack \rightarrow 0$ fail to be uniform in $t$ when $\theta = (\mu, \sigma)$ is fixed and satisfies $\mu \neq 0, \sigma_1^2 < \sigma^2 < \sigma_2^2 - \mu^2$. The space of values of $\sigma$ has been truncated in order to satisfy Wald's regularity conditions. In the following section boldface letters denote vectors and matrices. The law of the random vector $\mathbf{x}$ is denoted throughout by $\mathscr{L}(\mathbf{x})$. In particular, $\mathscr{N}(\mathbf{\mu}, \mathbf{\Sigma})$ refers to a normal law with mean vector $\mathbf{\mu}$ and covariance matrix $\mathbf{\Sigma}$. By $\mathscr{L}(\mathbf{x}_n) \rightarrow \mathscr{L}(\mathbf{y})$ or $\mathscr{L}(\mathbf{x}_n) \rightarrow \mathscr{N}(\mathbf{\mu}, \mathbf{\Sigma})$ is meant, respectively, that the law of $\mathbf{x}_n$ converges to the law of $\mathbf{y}$ or to the stated normal law, as $n \rightarrow \infty$. The definitions of the Mann-Wald symbols $O_p$ and $o_p$ may be found in Chernoff ((1956), Section 2), as may the statements of some basic results of large-sample theory which are used freely in the proof of the theorem.

#### Article information

Source
Ann. Math. Statist., Volume 43, Number 2 (1972), 447-454.

Dates
First available in Project Euclid: 27 April 2007

https://projecteuclid.org/euclid.aoms/1177692625

Digital Object Identifier
doi:10.1214/aoms/1177692625

Mathematical Reviews number (MathSciNet)
MR307406

Zentralblatt MATH identifier
0238.62023

JSTOR