The Annals of Mathematical Statistics

On Two-Stage Non-Parametric Estimation

Elizabeth H. Yen

Abstract

In this paper, a two-sample, two-stage nonparametric estimation problem will be studied. The parameter $\theta = \theta(F, G)$ under consideration is estimable (i.e., there exists an unbiased estimator $\phi = \phi(X_1, \cdots, X_r; Y_1, \cdots, Y_s)$ of $\theta$). $\phi$ is a function of independent observations from two populations with cumulative distribution functions $F(X)$ and $G(Y)$. The functions $F(X)$ and $G(Y)$ belong to a specified class $D$, such that a $U$-statistic based on $\phi$ is the unique minimum variance unbiased estimator of $\theta$. The total number of observations on populations $X$ and $Y$ will be a fixed number $N$. The sampling procedure is carried out in two stages. First, take $M$ observations from each of the populations; then allocate the remaining $N - 2M$ observations between the populations. The method of allocation utilizes the information from the first stage observations. Two kinds of two-stage estimators, represented by $U'$, and $U'$ will be introduced in this paper. Both $U'$ and $U"$ are $U$-statistics with random sample sizes. $U'$ is based essentially on the second stage observations only. $U''$ is defined on all $N$ observations. Intuitively, the statistic $U''$ is more appealing. The first stage observations are used not only to determine the allocation of the second stage observations, but also to estimate the parameter $\theta$. (see Section 3) One of the main results (Section 4) is that $U'$ is unbiased and under certain conditions, the variance of $U'$ approaches asymptotically to a particular variance $V_0$. (Here we shall consider the cases that both the variances of $U'$ and $U''$, are finite.) $U''$ is in general biased. However, under the same conditions the value $E(U'' - \theta)^2$ approaches asymptotically to the same value $V_0$. This value $V_0$ is the smallest variance of any one-stage $U$-statistic estimator of $\theta$, subject to the restriction that the total number of observations on $X$ and on $Y$ is $N. V_0$ is computed (see Section 2) when the best one-stage allocation of $N$ observations to the two populations is made with the help of partial or even complete information about the distributions $F(X)$ and $G(Y)$. Such information about $F$ and $G$ is represented by the "nuisance parameters" $b_{10} = b_{10}(F, G), b_{01} = b_{01}(F, G)$, etc., defined in Section 2. No prior knowledge of $b_{10}$ and $b_{01}$ is required to compute $\operatorname{Var}(U')$, and $E(U'' - \theta)^2$. In Section 5, the "optimal" choice of the first stage sample size $M$ relative to the fixed total sample size $N$ is discussed. The term "optimal" is in the sense that the particular choices of $M$ in relative order of magnitude of $N$, such that as $N$ goes to infinity, the ratios $\operatorname{Var}(U')/V_0$ and $E(U'' - \theta)^2/V_0$ approach unity as fast as possible in order of magnitude of $N$. Three cases with different conditions on $\phi$ are considered. It is found that the "optimal" choices depend on the specific conditions. Section 6 contains some examples. To each $\theta(F, G)$, the corresponding estimators for $b_{10}$ and $b_{01}$ together with their behavior under different conditions on $F$ and $G$, will be given. The examples include cases where the proposed procedures can be applied as well as cases where it cannot be applied. Section 7 shows the asymptotic normality of $U'$ and $U''$. Section 8 indicates that the proposed procedures can be extended to $k$-sample case, for $k > 2$, with similar results. The technique of two-stage estimation has been used in several papers. Stein  used it to determine confidence interval of a pre-assigned length for the mean of a normal population with unknown variance. Putter  used it to estimate the mean of a stratified normal population, Robbins  discussed such a technic for the design of experiments. Later, Ghurye and Robbins  used it to estimate the difference between the means of two normal populations (or some other specified populations). Richter  discussed the estimation of the common mean of two normal populations. During the preparation of the present paper, Alam  discussed the estimation of the common mean of $k \geqq 2$ normal populations. This paper generalizes these two-stage procedures in two ways. First, the underlying cumulative distributions $F, G$ are members of a larger class of distributions. Secondly, the underlying parameters $\theta(F, G)$ are not restricted to population means or functions of means. Consequently, in such a general setup the question of "the best" estimator of any particular parameter $\theta(F, G)$ is not considered in this paper.

Article information

Source
Ann. Math. Statist., Volume 35, Number 3 (1964), 1099-1114.

Dates
First available in Project Euclid: 27 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aoms/1177703268

Digital Object Identifier
doi:10.1214/aoms/1177703268

Mathematical Reviews number (MathSciNet)
MR165634

Zentralblatt MATH identifier
0128.13203

JSTOR