The Annals of Statistics

Do semidefinite relaxations solve sparse PCA up to the information limit?

Robert Krauthgamer, Boaz Nadler, and Dan Vilenchik

Full-text: Open access

Abstract

Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an $\ell_{0}$-sparse eigenvector, in the asymptotic regime as dimension $p$ and sample size $n$ both tend to infinity. Amini and Wainwright [ Ann. Statist. 37 (2009) 2877–2921] proved that for sparsity levels $k\geq\Omega(n/\log p)$, no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for $k\leq O(\sqrt{n/\log p})$, diagonal thresholding is consistent. It was further conjectured that an SDP approach may close this gap between computational and information limits. We prove that when $k\geq\Omega(\sqrt{n})$, the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of $\ell_{0}$-sparsity $k\geq\Omega(\sqrt{n})$. Finally, we present empirical results suggesting that up to sparsity levels $k=O(\sqrt{n})$, recovery is possible by a simple covariance thresholding algorithm.

Article information

Source
Ann. Statist., Volume 43, Number 3 (2015), 1300-1322.

Dates
Received: September 2014
Revised: January 2015
First available in Project Euclid: 15 May 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1431695645

Digital Object Identifier
doi:10.1214/15-AOS1310

Mathematical Reviews number (MathSciNet)
MR3346704

Zentralblatt MATH identifier
1320.62138

Subjects
Primary: 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 62F12: Asymptotic properties of estimators

Keywords
Principal component analysis spectral analysis spiked covariance ensembles sparsity high-dimensional statistics convex relaxation semidefinite programming Wishart ensembles random matrices integrality gap

Citation

Krauthgamer, Robert; Nadler, Boaz; Vilenchik, Dan. Do semidefinite relaxations solve sparse PCA up to the information limit?. Ann. Statist. 43 (2015), no. 3, 1300--1322. doi:10.1214/15-AOS1310. https://projecteuclid.org/euclid.aos/1431695645


Export citation

References

  • [1] Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. Random Structures Algorithms 13 457–466.
  • [2] Ames, B. P. W. and Vavasis, S. A. (2011). Nuclear norm minimization for the planted clique and biclique problems. Math. Program. 129 69–89.
  • [3] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
  • [4] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New York.
  • [5] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • [6] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
  • [7] Berthet, Q. and Rigollet, P. (2013). Complexity theoretic lower bounds for sparse principal component detection. In COLT 1046–1066. JMLR.org.
  • [8] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [9] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
  • [10] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
  • [11] d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56–66.
  • [12] d’Aspremont, A., El-Ghaoui, L., Jordan, M. and Lanckriet, G. (2004). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448.
  • [13] Debashis, P. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [14] Dekel, Y., Gurel-Gurevich, O. and Peres, Y. (2014). Finding hidden cliques in linear time with high probability. Combin. Probab. Comput. 23 29–49.
  • [15] Deshpande, Y. and Montanari, A. (2013). Finding hidden cliques of size $\sqrt{n/e}$ in nearly linear time. Available at arXiv:1304.7047.
  • [16] Deshpande, Y. and Montanari, A. (2014). Information-theoretically optimal sparse PCA. Available at arXiv:1402.2238.
  • [17] El-Karoui, N. (2003). On the largest eigenvalue of wishart matrices with identity covariance when $n,p$ and $p/n\to\infty$. Available at arXiv:math/0309355.
  • [18] Feige, U. and Krauthgamer, R. (2000). Finding and certifying a large hidden clique in a semirandom graph. Random Structures Algorithms 16 195–208.
  • [19] Feige, U. and Ron, D. (2010). Finding hidden cliques in linear time. In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10) 189–203. Assoc. Discrete Math. Theor. Comput. Sci., Nancy.
  • [20] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [21] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • [22] Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed. Springer, New York.
  • [23] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
  • [24] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
  • [25] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081–1102.
  • [26] Lei, J. and Vu, V. Q. (2015). Sparsistency and agnostic inference in sparse PCA. Ann. Statist. 43 299–322.
  • [27] Lu, Z. and Zhang, Y. (2012). An augmented Lagrangian approach for sparse principal component analysis. Math. Program. 135 149–193.
  • [28] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
  • [29] Moghaddam, B., Weiss, S. and Avidan, Y. (2006). Generalized spectral bounds for sparse LDA. In Proceedings of the 23rd International Conference on Machine Learning 641–648. ACM, New York.
  • [30] Moghaddam, B., Weiss, Y. and Avidan, S. (2006). Spectral bounds for sparse PCA: Exact and greedy algorithms. In Advances in Neural Information Processing Systems 915–922. MIT Press, Cambridge.
  • [31] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
  • [32] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
  • [33] Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput. 24 227–234.
  • [34] Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034.
  • [35] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
  • [36] Sturm, J. F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11/12 625–653.
  • [37] Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905–2947.
  • [38] Wang, T., Berthet, Q. and Samworth, R. (2014). Statistical and computational trade-offs in estimation of sparse principal components. Available at arXiv:1408.5369.
  • [39] Witten, D., Tibshirani, R. and Hastie, R. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
  • [40] Zhang, Z., Zha, H. and Simon, H. (2002). Low-rank approximations with sparse factors. I. Basic algorithms and error analysis. SIAM J. Matrix Anal. Appl. 23 706–727 (electronic).
  • [41] Zhaoran Wang, Z., Lu, H. and Liu, H. (2014). Nonconvex statistical optimization: Minimax-optimal sparse pca in polynomial time. Available at arXiv:1408.5352.
  • [42] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.