Shorter Confidence Intervals for the Mean of a Normal Distribution with Known Variance

John W. Pratt

doi:10.1214/aoms/1177704170

June, 1963 Shorter Confidence Intervals for the Mean of a Normal Distribution with Known Variance

John W. Pratt

Ann. Math. Statist. 34(2): 574-586 (June, 1963). DOI: 10.1214/aoms/1177704170

Abstract

This paper obtains and explores a family of confidence procedures for the mean of a normal distribution which are, in a certain sense, more efficient than the usual procedure. Let $X$ (possibly a sample mean) have a normal density $\varphi(x; \theta, \sigma^2)$ with unknown mean $\theta$, and known variance $\sigma^2$. Let $R(X)$ be a confidence region for $\theta$ at level $1 - \alpha$. Let $m(R)$ be the length of $R$ if $R$ is an interval; more generally for any region $R$ let \begin{equation*}\tag{1}m(R) = \int_R d\theta,\end{equation*} which we will also call the length of $R$. Then, following [3], we have \begin{align*}E_{\theta'} \{m(R(X))\} &= \int\int_{\theta\varepsilon R(x)} d\theta\varphi(x; \theta', \sigma^2) dx = \int P_\theta' \{\theta \varepsilon R(X)\} d\theta \\ \tag{2} \\ &= \int_{\theta\neq\theta'} P_\theta' \{\theta \varepsilon R(X)\} d\theta.\end{align*} Thus the expected length of the confidence region $R(X)$ may also be interpreted as the integral over all false values $\theta$ of the probability of covering $\theta$, where the expected length and the probability are both computed under the true value $\theta'$. Whether we are interested in length or in the probability of covering false values, we would like to make (2) small. For a particular $\theta'$, we can minimize (2) as follows. Let $A(\theta)$ be the acceptance region of the family of tests corresponding to $R(X)$, that is \begin{equation*}\tag{3}X \varepsilon A(\theta)\quad\text{if and only if} \theta \varepsilon R(X).\end{equation*} Substituting (3) in (2) gives \begin{equation*}\tag{4}E_{\theta'}\{m(R(X))\} = \int P_{\theta'}\{X \varepsilon A(\theta)\} d\theta = \int_{\theta\neq\theta'} P_{\theta'} \{X \varepsilon A(\theta)\} d\theta.\end{equation*} For $\theta \neq \theta', 1 - P_{\theta'}\{X \varepsilon A(\theta)\}$ is the power of the test of the null hypothesis value $\theta$ against the alternative $\theta'$. Thus we see that the expected length of the confidence region is minimized, when $\theta'$ is the true value, by choosing the test of each null hypothesis value $\theta$ which is most powerful against the alternative $\theta'$. This gives the confidence interval \begin{equation*}\tag{5}\min \{\theta', X - \xi_\alpha\sigma\} \leqq \theta \leqq \max \{\theta', X + \xi_\alpha\sigma\},\end{equation*} where $\xi_\alpha$ is the upper $\alpha$-point (not $\frac{1}{2} \alpha$-point) of the standard normal distribution. Table 1 shows what happens if we guess $\theta = \theta'$ and use the foregoing confidence procedure: our expected length will be considerably less than that of the usual procedure if we guess correctly, but greater if we are wrong by much over $2\sigma$ (for $\alpha = .05$). Since we do not know the true value of $\theta$, we may prefer to minimize not the expected length under a particular $\theta'$, but a weighted average of this. Consider then, for some $W \geqq 0$, the weighted average \begin{equation*}\tag{6} \int W(\theta') E_{\theta'} \{m(R(X))\} d\theta' = \int\int P_{\theta'} \{X \varepsilon A(\theta)\}W(\theta') d\theta' d\theta,\end{equation*} where the equality follows from (4). If $W$ is interpreted as the prior density of $\theta$, then (6) is the prior (marginal) expected length of the confidence region $R(X)$, but this interpretation need not be made. The procedure minimizing (6) also corresponds to a most powerful test of each null hypothesis value $\theta$ against a certain alternative distribution, as we shall see explicitly in Section 2. We are concerned with two kinds of question: (A) If we use the minimizing confidence procedure for some $W$, what is the effect of $W$ on the expected length and on the efficiency to be gained by giving up the usual procedure? In particular, how diffuse does $W$ have to get before the gain over the usual procedure is small in terms of the weighted average of the expected length? (B) Are the minimizing confidence intervals for $W$ more like posterior probability intervals obtained from the prior $W$ than are the usual intervals? Does the use of $W$ in selecting a confidence procedure largely eliminate the difference between confidence intervals and posterior probability intervals? Specifically, we will introduce a normal weight function $W(\theta') = \varphi(\theta'; \theta_0, \omega^2)$ and obtain the minimizing confidence procedure $R_\omega(X)$, which reduces to (5) with $\theta' = \theta_0$ when $\omega = 0$ and to the usual procedure when $\omega = \infty$ and is given by Figure 1 or Table 5 when $\omega = \sigma$ and $\alpha = .05$. Then to answer (A), Table 2 gives the expected length of the minimizing procedure for $\alpha = .05$ and various values of $\omega$. Notice that, as $\omega$ increases, so does the value of $\theta - \theta_0$ at which the minimizing procedure has the same expected length as the usual procedure: for $\omega = 4\sigma, \theta - \theta_0$ can be about $1.5\omega = 6\sigma$ before the usual procedure is better; for $\omega = 2\sigma$, about $1.6\omega = 3.2\sigma$; for $\omega = 0$, about $2\sigma$ (from Table 1). Table 2 also gives the weighted average (6) of the expected length of the minimizing procedure. Thus for $\omega = 4\sigma$, the minimizing confidence procedure has weighted average expected length 1.5{\tt\%} less than the usual procedure, and in this sense saves 3{\tt\%} of the observations, so that the usual procedure is 97{\tt\%} efficient. For $\omega \leqq 2\sigma$, however, the usual procedure wastes more than 10{\tt\%} of the observations, in the same sense. To answer (B), Table 3 gives, for $\alpha = .05$ and $\omega = \sigma$, the posterior probability of the usual interval and the minimizing confidence interval when the prior density is normal with variance $\omega^2 = \sigma^2$. The $X$-scale reflects that the fact that, with this prior, the prior (marginal) variance of $X$ is $2\sigma^2$. Even for a priori3 probable $X$'s, the confidence level .95 is a poor approximation to the posterior probability of either interval, though much poorer for the usual interval. Thus, by using a prior distribution, one can obtain a confidence procedure with prior expected length substantially less than that of the usual procedure (11{\tt\%} less in this instance, corresponding to a 23{\tt\%} smaller sample size), but considerable discrepancy remains between confidence and posterior probability. In Section 2, we obtain the minimizing procedure and the formulas used in calculating the tables. Section 3 concerns conditioning on the event that the confidence region covers the true value of $\theta$. Section 4 consists of remarks, which are largely independent of Sections 2 and 3. I am grateful to the referee for a number of suggestions and comments.

Citation

Download Citation

John W. Pratt. "Shorter Confidence Intervals for the Mean of a Normal Distribution with Known Variance." Ann. Math. Statist. 34 (2) 574 - 586, June, 1963. https://doi.org/10.1214/aoms/1177704170