The Annals of Statistics

Optimality and sub-optimality of PCA I: Spiked random matrix models

Abstract

A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or “spike”) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including nonspectral tests. Our results leverage Le Cam’s notion of contiguity and include:

(i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike.

(ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries.

(iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes.

Article information

Source
Ann. Statist., Volume 46, Number 5 (2018), 2416-2451.

Dates
Revised: July 2017
First available in Project Euclid: 17 August 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1534492840

Digital Object Identifier
doi:10.1214/17-AOS1625

Mathematical Reviews number (MathSciNet)
MR3845022

Zentralblatt MATH identifier
06964337

Citation

Perry, Amelia; Wein, Alexander S.; Bandeira, Afonso S.; Moitra, Ankur. Optimality and sub-optimality of PCA I: Spiked random matrix models. Ann. Statist. 46 (2018), no. 5, 2416--2451. doi:10.1214/17-AOS1625. https://projecteuclid.org/euclid.aos/1534492840

References

• Amini, A. A. and Wainwright, M. J. (2008). High-dimensional analysis of semidefinite relaxations for sparse principal components. In IEEE International Symposium on Information Theory 2454–2458.
• Anderson, G. W., Guionnet, A. and Zeitouni, O. (2010). An Introduction to Random Matrices. Cambridge Studies in Advanced Mathematics 118. Cambridge Univ. Press, Cambridge.
• Arias-Castro, E., Bubeck, S. and Lugosi, G. (2012). Detection of correlations. Ann. Statist. 40 412–435.
• Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
• Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
• Arias-Castro, E. and Verzelen, N. (2014). Community detection in dense random networks. Ann. Statist. 42 940–969.
• Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer, New York.
• Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
• Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
• Bandeira, A. S., Boumal, N. and Singer, A. (2014). Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Available at arXiv:1411.3272.
• Banks, J., Moore, C., Neeman, J. and Netrapalli, P. (2016). Information-theoretic thresholds for community detection in sparse networks. In 29th Annual Conference on Learning Theory 383–416.
• Banks, J., Moore, C., Verzelen, N., Vershynin, R. and Xu, J. (2017). Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization. Available at arXiv:1607.05222.
• Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T. and Zdeborova, L. (2016). Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. Available at arXiv:1606.04142.
• Barron, A. R. (1986). Entropy and the central limit theorem. Ann. Probab. 14 336–342.
• Bayati, M. and Montanari, A. (2011). The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory 57 764–785.
• Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521.
• Berthet, Q. and Rigollet, P. (2013a). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• Berthet, Q. and Rigollet, P. (2013b). Complexity theoretic lower bounds for sparse principal component detection. In COLT 1046–1066.
• Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
• Boumal, N. (2016). Nonconvex phase synchronization. SIAM J. Optim. 26 2355–2377.
• Boumal, N., Singer, A., Absil, P.-A. and Blondel, V. D. (2014). Cramér–Rao bounds for synchronization of rotations. Inf. Inference 3 1–39.
• Brown, L. D. (1982). A proof of the central limit theorem motivated by the Cramér–Rao inequality. In Statistics and Probability: Essays in Honor of C. R. Rao (G. Kallianpur, P. Krishnaiah and J. Ghosh, eds.) 141–148. North-Holland, Amsterdam.
• Butucea, C. and Ingster, Y. I. (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 2652–2688.
• Cai, T. T., Jin, J. and Low, M. G. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421–2449.
• Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781–815.
• Capitaine, M., Donati-Martin, C. and Féral, D. (2009). The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann. Probab. 37 1–47.
• Deshpande, Y., Abbe, E. and Montanari, A. (2016). Asymptotic mutual information for the binary stochastic block model. In 2016 IEEE International Symposium on Information Theory 185–189.
• Deshpande, Y. and Montanari, A. (2014a). Sparse PCA via covariance thresholding. In Advances in Neural Information Processing Systems 334–342.
• Deshpande, Y. and Montanari, A. (2014b). Information-theoretically optimal sparse PCA. In IEEE International Symposium on Information Theory 2197–2201.
• Deshpande, Y., Montanari, A. and Richard, E. (2014). Cone-constrained principal component analysis. In Advances in Neural Information Processing Systems 2717–2725.
• Dobriban, E. (2017). Sharp detection in PCA under correlations: All eigenvalues matter. Ann. Statist. 45 1810–1833.
• Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• Donoho, D. L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914–18919.
• Egloff, D., Leippold, M. and Wu, L. (2010). The term structure of variance swap rates and optimal variance swap investments. J. Financ. Quant. Anal. 45 1279–1310.
• Féral, D. and Péché, S. (2007). The largest eigenvalue of rank one deformation of large Wigner matrices. Comm. Math. Phys. 272 185–228.
• Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic-factor model: Identification and estimation. Rev. Econ. Stat. 82 540–554.
• Guerra, F. (2003). Broken replica symmetry bounds in the mean field spin glass model. Comm. Math. Phys. 233 1–12.
• Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
• Janson, S. (1995). Random regular graphs: Asymptotic distributions and contiguity. Combin. Probab. Comput. 4 369–405.
• Javanmard, A. and Montanari, A. (2013). State evolution for general approximate message passing algorithms, with applications to spatial coupling. Inf. Inference 2 115–144.
• Javanmard, A., Montanari, A. and Ricci-Tersenghi, F. (2016). Phase transitions in semidefinite relaxations. Proc. Natl. Acad. Sci. USA 113 E2218–E2223.
• Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• Johnstone, I. M. and Lu, A. Y. (2004). Sparse principal components analysis. Unpublished manuscript 7.
• Johnstone, I. M. and Onatski, A. (2015). Testing in high-dimensional spiked models. Available at arXiv:1509.07269.
• Kannan, R. and Vempala, S. (2016). Beyond spectral: Tight bounds for planted Gaussians. Available at arXiv:1608.03643.
• Ke, Z. T. (2016). Detecting rare and weak spikes in large covariance matrices. Available at arXiv:1609.00883.
• Krauthgamer, R., Nadler, B. and Vilenchik, D. (2015). Do semidefinite relaxations solve sparse PCA up to the information limit? Ann. Statist. 43 1300–1322.
• Krzakala, F., Xu, J. and Zdeborová, L. (2016). Mutual information in rank-one matrix estimation. Available at arXiv:1603.08447.
• Lelarge, M. and Miolane, L. (2016). Fundamental limits of symmetric low-rank matrix estimation. Available at arXiv:1611.03888.
• Lesieur, T., Krzakala, F. and Zdeborová, L. (2015a). Phase transitions in sparse PCA. In IEEE International Symposium on Information Theory 1635–1639.
• Lesieur, T., Krzakala, F. and Zdeborová, L. (2015b). MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel. In 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) 680–687.
• Le Cam, L. (1960). Locally Asymptotically Normal Families of Distributions: Certain Approximations to Families of Distributions and Their Use in the Theory of Estimation and Testing Hypotheses. Univ. California Press, Berkeley, CA.
• Litterman, R. B. and Scheinkman, J. (1991). Common factors affecting bond returns. J. Fixed Income 1 54–61.
• Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
• Ma, Z. and Wu, Y. (2015). Computational barriers in minimax submatrix detection. Ann. Statist. 43 1089–1116.
• McSherry, F. (2001). Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science 529–537.
• Molloy, M. S. O., Robalewska, H., Robinson, R. W. and Wormald, N. C. (1997). $1$-factorizations of random regular graphs. Random Structures Algorithms 10 305–321.
• Montanari, A., Reichman, D. and Zeitouni, O. (2015). On the limitation of spectral methods: From the Gaussian hidden clique problem to rank-one perturbations of Gaussian tensors. In Advances in Neural Information Processing Systems 217–225.
• Montanari, A. and Richard, E. (2016). Non-negative principal component analysis: Message passing algorithms and sharp asymptotics. IEEE Trans. Inform. Theory 62 1458–1484.
• Mossel, E., Neeman, J. and Sly, A. (2015). Reconstruction and estimation in the planted partition model. Probab. Theory Related Fields 162 431–461.
• Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
• Onatski, A., Moreira, M. J. and Hallin, M. (2013). Asymptotic power of sphericity tests for high-dimensional data. Ann. Statist. 41 1204–1231.
• Onatski, A., Moreira, M. J. and Hallin, M. (2014). Signal detection in high dimension: The multispiked case. Ann. Statist. 42 225–254.
• Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
• Péché, S. (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probab. Theory Related Fields 134 127–173.
• Perry, A., Wein, A. S. and Bandeira, A. S. (2016). Statistical limits of spiked tensor models. Available at arXiv:1612.07728.
• Perry, A., Wein, A. S., Bandeira, A. S. and Moitra, A. (2017). Supplement to “Optimality and sub-optimality of PCA I: Spiked random matrix models.” DOI:10.1214/17-AOS1625SUPP.
• Pitman, E. J. G. (1979). Some Basic Theory for Statistical Inference. Chapman & Hall, London; A Halsted Press Book, Wiley, New York.
• Pizzo, A., Renfrew, D. and Soshnikov, A. (2013). On finite rank deformations of Wigner matrices. Ann. Inst. Henri Poincaré Probab. Stat. 49 64–94.
• Rangan, S. and Fletcher, A. K. (2012). Iterative estimation of constrained rank-one matrices in noise. In IEEE International Symposium on Information Theory 1246–1250.
• Robinson, R. W. and Wormald, N. C. (1994). Almost all regular graphs are Hamiltonian. Random Structures Algorithms 5 363–374.
• Shen, D., Shen, H. and Marron, J. S. (2013). Consistency of sparse PCA in high dimension, low sample size contexts. J. Multivariate Anal. 115 317–333.
• Singer, A. (2011). Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal. 30 20–36.
• Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97 1167–1179.
• Sun, X. and Nobel, A. B. (2008). On the size and recovery of submatrices of ones in a random binary matrix. J. Mach. Learn. Res. 9 2431–2453.
• Sun, X. and Nobel, A. B. (2013). On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix. Bernoulli 19 275–294.
• Tao, T. (2012). Topics in Random Matrix Theory. Graduate Studies in Mathematics 132. Amer. Math. Soc., Providence, RI.
• Tao, T. and Vu, V. (2014). Random matrices: The universality phenomenon for Wigner ensembles. In Modern Aspects of Random Matrix Theory. Proc. Sympos. Appl. Math. 72 121–172. Amer. Math. Soc., Providence, RI.
• Verzelen, N. and Arias-Castro, E. (2015). Community detection in sparse random networks. Ann. Appl. Probab. 25 3465–3510.
• Vu, V. Q. and Lei, J. (2012). Minimax rates of estimation for sparse PCA in high dimensions. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS) 1278–1286.
• Wormald, N. C. (1999). Models of random regular graphs. In Surveys in Combinatorics, 1999 (Canterbury). London Mathematical Society Lecture Note Series 267 239–298. Cambridge Univ. Press, Cambridge.

Supplemental materials

• Optimality and sub-optimality of PCA in spiked random matrix models: Supplementary proofs. Contains proofs omitted from this paper for the sake of length.