The Annals of Statistics
- Ann. Statist.
- Volume 41, Number 2 (2013), 772-801.
Sparse principal component analysis and iterative thresholding
Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features $p$ is comparable to, or even much larger than, the sample size $n$. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.
Ann. Statist., Volume 41, Number 2 (2013), 772-801.
First available in Project Euclid: 8 May 2013
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 62H12: Estimation
Secondary: 62G20: Asymptotic properties 62H25: Factor analysis and principal components; correspondence analysis
Ma, Zongming. Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 (2013), no. 2, 772--801. doi:10.1214/13-AOS1097. https://projecteuclid.org/euclid.aos/1368018173
- Supplementary material: Supplement to “Sparse principal component analysis and iterative thresholding”. We give in the supplement proofs to Corollaries 3.1 and 3.2, Proposition 3.1 and all the claims in Section 6.