The Annals of Statistics

Sparse principal component analysis and iterative thresholding

Zongming Ma

Abstract

Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features $p$ is comparable to, or even much larger than, the sample size $n$. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

Article information

Source
Ann. Statist., Volume 41, Number 2 (2013), 772-801.

Dates
First available in Project Euclid: 8 May 2013

Permanent link to this document
https://projecteuclid.org/euclid.aos/1368018173

Digital Object Identifier
doi:10.1214/13-AOS1097

Mathematical Reviews number (MathSciNet)
MR3099121

Zentralblatt MATH identifier
1267.62074

Citation

Ma, Zongming. Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 (2013), no. 2, 772--801. doi:10.1214/13-AOS1097. https://projecteuclid.org/euclid.aos/1368018173

References

Supplemental materials

• Supplementary material: Supplement to “Sparse principal component analysis and iterative thresholding”. We give in the supplement proofs to Corollaries 3.1 and 3.2, Proposition 3.1 and all the claims in Section 6.