The Annals of Mathematical Statistics

Some Properties of the Least Squares Estimator in Regression Analysis when the Predictor Variables are Stochastic

P. K. Bhattacharya

Abstract

In the classical linear estimation set-up, we have \begin{equation*}\tag{1}E\mathbf{y} = X\mathbf{\theta},\end{equation*} where $\mathbf{y}' = (y_1, \cdots, y_n)$ is a random vector whose components are uncorrelated and have equal variance, $\mathbf{\theta}' = (\theta_0, \theta_1, \cdots, \theta_p)$ is a vector whose elements are unknown constants and $X$ is a matrix of $n$ rows and $p + 1$ columns, $n \geqq p + 1$, which has full rank and whose elements are known constants. Plackett  gives a historical note on the least squares estimator $\hat{\mathbf{\theta}} = (X'X)^{-1}X'\mathbf{y}$ of $\mathbf{\theta}$ for which the following property is well-known, (I) Each component $\hat{\theta}_j$ of $\hat{\mathbf{\theta}}$ is the estimator with uniformly minimum variance among all unbiased linear estimators of the corresponding component $\theta_j$ of $\mathbf{\theta}$. It can also be easily seen that (II) For a quadratic loss function for the estimation of each component $\theta_j$ of $\mathbf{\theta}$, the least squares estimators have uniformly minimum risk among the class of all linear (in $y$'s) estimators with bounded risk. Properties (I) and (II) hold for model (1) with uncorrelated and homoscedastic $y_1, \cdots, y_n$. Hodges and Lehmann  have shown that if we do not restrict ourselves to estimators which are linear in $y$'s, then the least squares estimators have the following weaker property: (III) If the loss in estimating the true vector $\mathbf{\theta}$ by another $\mathcal{B}$ is $(\mathbf{\theta} - \mathcal{B})' (\mathbf{\theta} - \mathcal{B})$, then the least squares estimator is minimax among the class of all estimators of $\mathbf{\theta}$ if there exists a number $v$ such that $\operatorname{Var} y_i \leqq v, i = 1, \cdots, n$, and the family of distributions of $\mathscr{F}$ of $(y_1, \cdots, y_n)$ contains the sub-family $\mathscr{F}_0$ of all independent normal distributions of $(y_1, \cdots, y_n)$ which satisfy (1) for some $\mathbf{\theta}$, and have $\operatorname{Var} y_i = v, i = 1, \cdots, n$. In many situations, however, $(y, x_1, \cdots, x_p)$ follows a $(p + 1$-variate distribution on which observations are made and the method of least squares is applied to estimate the linear regression of $y$ on $x_1, \cdots, x_p$, regarding the $x$-observations to be non-stochastic. This problem differs from the classical problem of linear estimation because instead of (1) the model is $E\lbrack\mathbf{y} \mid X\rbrack = X\mathbf{\theta}$, where the elements of the $X$ matrix are stochastic. For reasons given in Section 2, the loss in estimating the true regression function $\phi(x_1, \cdots, x_p)$ by another function $\psi(x_1, \cdots, x_p)$ is considered to be of the form, $\int \lbrack\phi(x_1, \cdots, x_p) - \psi(x_1, \cdots, x_p)\rbrack^2 dF(x_1, \cdots, x_p),$ where $F$ is the distribution function of $(x_1, \cdots, x_p)$. For the above loss function, it is shown under certain conditions that if the class of estimates which are linear in $y$'s and have bounded risk is non-empty, then the estimate obtained by the method of least squares belongs to this class and has uniformly minimum risk in this class. A necessary and sufficient condition on $F(x_1, \cdots, x_p)$ is obtained for this class to be non-empty, which unfortunately is not easy to verify in particular cases. However, by a sequential modification of the sampling scheme, this condition may always be satisfied at the cost of an arbitrarily small increase in the expected sample size. It is also shown under certain further conditions on the family of admissible distributions that the least squares estimator is minimax in the class of all estimators. For the case of multivariate normal distribution of $(y, x_1, \cdots, x_p)$, Stein  has considered this problem under a loss function similar to the one given above. He has shown the minimax property of the least squares estimates (which also happen to be the maximum likelihood estimates in a multivariate normal model) for the regression coefficients, and has raised many interesting questions about the admissibility of these estimates.

Article information

Source
Ann. Math. Statist., Volume 33, Number 4 (1962), 1365-1374.

Dates
First available in Project Euclid: 27 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aoms/1177704369

Digital Object Identifier
doi:10.1214/aoms/1177704369

Mathematical Reviews number (MathSciNet)
MR141189

Zentralblatt MATH identifier
0109.13203

JSTOR