## The Annals of Statistics

- Ann. Statist.
- Volume 40, Number 1 (2012), 73-103.

### UPS delivers optimal phase diagram in high-dimensional variable selection

#### Abstract

Consider a linear model *Y* = *Xβ* + *z*, *z* ∼ *N*(0, *I*_{n}). Here, *X* = *X*_{n,p}, where both *p* and *n* are large, but *p* > *n*. We model the rows of *X* as i.i.d. samples from *N*(0, 1/*n* Ω), where Ω is a *p* × *p* correlation matrix, which is unknown to us but is presumably sparse. The vector *β* is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros.

We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation.

We measure the performance of a procedure by the Hamming distance, and use an asymptotic framework where *p* → ∞ and other quantities (e.g., *n*, sparsity level and strength of signals) are linked to *p* by fixed parameters. We find that in many cases, the UPS achieves the optimal rate of convergence. Also, for many different Ω, there is a common three-phase diagram in the two-dimensional phase space quantifying the signal sparsity and signal strength. In the first phase, it is possible to recover all signals. In the second phase, it is possible to recover most of the signals, but not all of them. In the third phase, successful variable selection is impossible. UPS partitions the phase space in the same way that the optimal procedures do, and recovers most of the signals as long as successful variable selection is possible.

The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions in the phase space where neither of them is rate optimal, even in very simple settings, such as Ω is tridiagonal, and when the tuning parameter is ideally set.

#### Article information

**Source**

Ann. Statist., Volume 40, Number 1 (2012), 73-103.

**Dates**

First available in Project Euclid: 15 March 2012

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1331830775

**Digital Object Identifier**

doi:10.1214/11-AOS947

**Mathematical Reviews number (MathSciNet)**

MR3013180

**Zentralblatt MATH identifier**

1246.62160

**Subjects**

Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators

Secondary: 62G20: Asymptotic properties 62C05: General considerations

**Keywords**

Graph Hamming distance lasso Stein’s normal means penalization methods phase diagram screen and clean subset selection variable selection

#### Citation

Ji, Pengsheng; Jin, Jiashun. UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 (2012), no. 1, 73--103. doi:10.1214/11-AOS947. https://projecteuclid.org/euclid.aos/1331830775

#### Supplemental materials

- Supplementary material: Supplementary material for “UPS delivers optimal phase diagram in high-dimensional variable selection”. Owing to space constraints, the technical proofs are moved to a supplementary document [18].Digital Object Identifier: doi:10.1214/11-AOS947SUPP