Brazilian Journal of Probability and Statistics

PCA and eigen-inference for a spiked covariance model with largest eigenvalues of same asymptotic order

Addy Bolivar-Cime and Victor Perez-Abreu

Full-text: Open access

Abstract

In this paper, we work under the setting of data with high dimension $d$ greater than the sample size $n$ (HDLSS). We study asymptotics of the first $p\geq2$ sample eigenvalues and their corresponding eigenvectors under a spiked covariance model for which its first $p$ largest population eigenvalues have the same asymptotic order of magnitude as $d$ tends to infinity and the rest are constant. We get the asymptotic joint distribution of the nonzero sample eigenvalues when $d\rightarrow\infty$ and the sample size $n$ is fixed. We then prove that the $p$ largest sample eigenvalues increase jointly at the same speed as their population counterpart, in the sense that the vector of ratios of the sample and population eigenvalues converges to a multivariate distribution when $d\rightarrow\infty$ and $n$ is fixed, and to the vector of ones when both $d,n\rightarrow\infty$ and $d\gg n$. We also show the subspace consistency of the corresponding sample eigenvectors when $d$ goes to infinity and $n$ is fixed. Furthermore, using the asymptotic joint distribution of the sample eigenvalues we study some inference problems for the spiked covariance model and propose hypothesis tests for a particular case of this model and confidence intervals for the $p$ largest eigenvalues. A simulation is performed to assess the behavior of the proposed statistical methodologies.

Article information

Source
Braz. J. Probab. Stat., Volume 28, Number 2 (2014), 255-274.

Dates
First available in Project Euclid: 4 April 2014

Permanent link to this document
https://projecteuclid.org/euclid.bjps/1396615440

Digital Object Identifier
doi:10.1214/12-BJPS205

Mathematical Reviews number (MathSciNet)
MR3189497

Zentralblatt MATH identifier
1319.62118

Keywords
Principal Component Analysis spiked covariance model eigen-inference hypothesis test confidence interval high dimensional data HDLSS

Citation

Bolivar-Cime, Addy; Perez-Abreu, Victor. PCA and eigen-inference for a spiked covariance model with largest eigenvalues of same asymptotic order. Braz. J. Probab. Stat. 28 (2014), no. 2, 255--274. doi:10.1214/12-BJPS205. https://projecteuclid.org/euclid.bjps/1396615440


Export citation

References

  • Ahn, J., Marron, J. S., Muller, K. M. and Chi, Y. (2007). The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94, 760–766.
  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Hoboken, NJ: Wiley.
  • Bai, Z. D. and Yang, J. (2008). Central limit theorems for eigenvalues in spiked population model. Annales de l’Institut Henri Poincaré. Probabilités et Statistiques 44, 447–474.
  • Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis 97, 1382–1408.
  • Chung, K. L. (2001). A Course in Probability Theory, 3rd ed. San Diego: Academic Press.
  • Consul, P. C. (1967). On the exact distribution of the criterion $W$ for testing sphericity in a $p$-variate normal distribution. The Annals of Mathematical Statistics 38, 1170–1174.
  • Consul, P. C. (1969). The Exact Distribution of Likelihood Criteria for Different Hypothesis. Multivariate Analysis 2. New York: Academic Press.
  • Hall, P., Marron, J. S. and Neeman, A. (2005). Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society, Ser. B 67, 427–444.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics 29, 295–327.
  • Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. The Annals of Statistics 37, 4104–4130.
  • Jung, S., Sen, A. and Marron, J. S. (2012). Boundary behavior in high dimension, low sample size asymptotics of PCA. Journal of Multivariate Analysis 109, 190–203.
  • Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory, 2nd ed. New York: Wiley.
  • Nagarsenker, B. N. and Pillai, K. C. S. (1973). The distribution of the sphericity test criterion. Journal of Multivariate Analysis 3, 226–235.
  • Yata, K. and Aoshima, M. (2009). PCA consistency for non-Gaussian data in high dimension, low sample size context. Communications in Statistics—Theory and Methods, 38, 2634–2652.