## The Annals of Statistics

- Ann. Statist.
- Volume 45, Number 4 (2017), 1810-1833.

### Sharp detection in PCA under correlations: All eigenvalues matter

#### Abstract

Principal component analysis (PCA) is a widely used method for dimension reduction. In high-dimensional data, the “signal” eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the “noise” eigenvalues. Therefore, popular tests based on the largest eigenvalue have little power to detect weak PCs. In the special case of the spiked model, certain tests asymptotically equivalent to linear spectral statistics (LSS)—averaging effects over *all* eigenvalues—were recently shown to achieve some power.

We consider a “local alternatives” model for the spectrum of covariance matrices that allows a general correlation structure. We develop new tests to detect PCs in this model. While the top eigenvalue contains little information, due to the strong correlations between the eigenvalues we can detect weak PCs by averaging over all eigenvalues using LSS. We show that it is possible to find the optimal LSS, by solving a certain integral equation. To solve this equation, we develop efficient algorithms that build on our recent method for computing the limit empirical spectrum [Dobriban (2015)]. The solvability of this equation also presents a new perspective on phase transitions in spiked models.

#### Article information

**Source**

Ann. Statist., Volume 45, Number 4 (2017), 1810-1833.

**Dates**

Received: February 2016

Revised: August 2016

First available in Project Euclid: 28 June 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1498636875

**Digital Object Identifier**

doi:10.1214/16-AOS1514

**Mathematical Reviews number (MathSciNet)**

MR3670197

**Zentralblatt MATH identifier**

06773292

**Subjects**

Primary: 62H25: Factor analysis and principal components; correspondence analysis

Secondary: 62H15: Hypothesis testing 45B05: Fredholm integral equations

**Keywords**

Principal component analysis linear spectral statistic random matrix theory linear integral equation optimal testing

#### Citation

Dobriban, Edgar. Sharp detection in PCA under correlations: All eigenvalues matter. Ann. Statist. 45 (2017), no. 4, 1810--1833. doi:10.1214/16-AOS1514. https://projecteuclid.org/euclid.aos/1498636875

#### Supplemental materials

- Supplement to “Sharp detection in PCA under correlations: All eigenvalues matter”. In the supplementary material, we give the remaining details of the proofs, algorithms implementing our method and further simulations.Digital Object Identifier: doi:10.1214/16-AOS1514SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.