Open Access
December 2009 PCA consistency in high dimension, low sample size context
Sungkyu Jung, J. S. Marron
Ann. Statist. 37(6B): 4104-4130 (December 2009). DOI: 10.1214/09-AOS709


Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows [i.e., High Dimension, Low Sample Size (HDLSS)] are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLSS asymptotics are used to study consistency, strong inconsistency and subspace consistency. We show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most other PC directions are strongly inconsistent. Broad sets of sufficient conditions for each of these cases are specified and the main theorem gives a catalogue of possible combinations. In preparation for these results, we show that the geometric representation of HDLSS data holds under general conditions, which includes a ρ-mixing condition and a broad range of sphericity measures of the covariance matrix.


Download Citation

Sungkyu Jung. J. S. Marron. "PCA consistency in high dimension, low sample size context." Ann. Statist. 37 (6B) 4104 - 4130, December 2009.


Published: December 2009
First available in Project Euclid: 23 October 2009

zbMATH: 1191.62108
MathSciNet: MR2572454
Digital Object Identifier: 10.1214/09-AOS709

Primary: 34L20 , 62H25
Secondary: 62F12

Keywords: consistency and strong inconsistency , high dimension , low sample size data , Nonstandard asymptotics , Principal Component Analysis , Sample covariance matrix , Spiked population model , ρ-mixing

Rights: Copyright © 2009 Institute of Mathematical Statistics

Vol.37 • No. 6B • December 2009
Back to Top