Abstract
We consider the regression model with observation error in the design:
\begin{eqnarray*}y&=&X\theta^*+\xi,\\ Z&=&X+\Xi.\end{eqnarray*}
Here the random vector $y\in\mathbb{R}^n$ and the random $n\times p$ matrix $Z$ are observed, the $n\times p$ matrix $X$ is unknown, $\Xi$ is an $n\times p$ random noise matrix, $\xi\in\mathbb{R}^n$ is a random noise vector, and $\theta^*$ is a vector of unknown parameters to be estimated. We consider the setting where the dimension $p$ can be much larger than the sample size $n$ and $\theta^*$ is sparse. Because of the presence of the noise matrix $\Xi$, the commonly used Lasso and Dantzig selector are unstable. An alternative procedure called the Matrix Uncertainty (MU) selector has been proposed in Rosenbaum and Tsybakov [ The Annals of Statistics 38 (2010) 2620–2651] in order to account for the noise. The properties of the MU selector have been studied in Rosenbaum and Tsybakov [ The Annals of Statistics 38 (2010) 2620–2651] for sparse $\theta^*$ under the assumption that the noise matrix $\Xi$ is deterministic and its values are small. In this paper, we propose a modification of the MU selector when $\Xi$ is a random matrix with zero-mean entries having the variances that can be estimated. This is, for example, the case in the model where the entries of $X$ are missing at random. We show both theoretically and numerically that, under these conditions, the new estimator called the Compensated MU selector achieves better accuracy of estimation than the original MU selector.
Information
Digital Object Identifier: 10.1214/12-IMSCOLL920