The Annals of Statistics

Statistical and computational trade-offs in estimation of sparse principal components

Abstract

In recent years, sparse principal component analysis has emerged as an extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or sub-Gaussian classes. In this paper, we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a restricted covariance concentration condition, we show that there is an effective sample size regime in which no randomised polynomial time algorithm can achieve the minimax optimal rate. We also study the theoretical performance of a (polynomial time) variant of the well-known semidefinite relaxation estimator, revealing a subtle interplay between statistical and computational efficiency.

Article information

Source
Ann. Statist., Volume 44, Number 5 (2016), 1896-1930.

Dates
Revised: July 2015
First available in Project Euclid: 12 September 2016

https://projecteuclid.org/euclid.aos/1473685263

Digital Object Identifier
doi:10.1214/15-AOS1369

Mathematical Reviews number (MathSciNet)
MR3546438

Zentralblatt MATH identifier
1349.62254

Citation

Wang, Tengyao; Berthet, Quentin; Samworth, Richard J. Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist. 44 (2016), no. 5, 1896--1930. doi:10.1214/15-AOS1369. https://projecteuclid.org/euclid.aos/1473685263

References

• Allen, G. I. and Maletić-Savatić, M. (2011). Sparse non-negative generalized PCA with applications to metabolomics. Bioinformatics 27 3029–3035.
• Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, CA, 1998) 594–598. ACM, New York.
• Alon, N., Andoni, A., Kaufman, T., Matulef, K., Rubinfeld, R. and Xie, N. (2007). Testing $k$-wise and almost $k$-wise independence. In STOC’07—Proceedings of the 39th Annual ACM Symposium on Theory of Computing 496–505. ACM, New York.
• Ames, B. P. W. and Vavasis, S. A. (2011). Nuclear norm minimization for the planted clique and biclique problems. Math. Program. 129 69–89.
• Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
• Applebaum, B., Barak, B. and Wigderson, A. (2010). Public-key cryptography from different assumptions. In STOC’10—Proceedings of the 2010 ACM International Symposium on Theory of Computing 171–180. ACM, New York.
• Bach, F., Ahipaşaoǧlu, S. D. and d’Aspremont, A. (2010). Convex relaxations for subset selection. Available at arXiv:1006.3601.
• Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
• Berthet, Q. (2015). Optimal testing for planted satisfiability problems. Electron. J. Stat. 9 298–317.
• Berthet, Q. and Rigollet, P. (2013a). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• Berthet, Q. and Rigollet, P. (2013b). Complexity theoretic lower bounds for sparse principal component detection. J. Mach. Learn. Res. W&CP 30 1046–1066.
• Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
• Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• Chan, Y.-b. and Hall, P. (2010). Using evidence of mixed populations to select variables for clustering very high-dimensional data. J. Amer. Statist. Assoc. 105 798–809.
• Chandrasekaran, V. and Jordan, M. I. (2013). Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. USA 110 E1181–E1190.
• Chen, Y. and Xu, J. (2014). Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. Available at arXiv:1402.1267.
• Chun, H. and Sündüz, K. (2009). Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 182 79–90.
• d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
• Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46.
• Deshpande, Y. and Montanari, A. (2014). Sparse PCA via covariance thresholding. Preprint. Available at arXiv:1311.5179.
• Diaconis, P. and Freedman, D. (1980). Finite exchangeable sequences. Ann. Probab. 8 745–764.
• Feige, U. and Krauthgamer, R. (2000). Finding and certifying a large hidden clique in a semirandom graph. Random Structures Algorithms 16 195–208.
• Feige, U. and Krauthgamer, R. (2003). The probable value of the Lovász–Schrijver relaxations for maximum independent set. SIAM J. Comput. 32 345–370 (electronic).
• Feige, U. and Ron, D. (2010). Finding hidden cliques in linear time. In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10) 189–203. Assoc. Discrete Math. Theor. Comput. Sci., Nancy.
• Feldman, V., Perkins, W. and Vempala, S. (2015). On the complexity of random satisfiability problems with planted solutions. In STOC’15—Proceedings of the 2015 ACM Symposium on Theory of Computing 77–86 ACM, New York.
• Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S. S. and Xiao, Y. (2013). Statistical algorithms and a lower bound for detecting planted cliques. In STOC’13—Proceedings of the 2013 ACM Symposium on Theory of Computing 655–664. ACM, New York.
• Gao, C., Ma, Z. and Zhou, H. H. (2014). Sparse CCA: Adaptive estimation and computational barriers. Available at arXiv:1409.8565.
• Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD.
• Grimmett, G. R. and McDiarmid, C. J. H. (1975). On colouring random graphs. Math. Proc. Cambridge Philos. Soc. 77 313–324.
• Hajek, B., Wu, Y. and Xu, J. (2014). Computational lower bounds for community detection on random graphs. Preprint. Available at arXiv:1406.6625.
• Hazan, E. and Krauthgamer, R. (2011). How hard is it to approximate the best Nash equilibrium? SIAM J. Comput. 40 79–91.
• Horn, R. A. and Johnson, C. R. (2012). Matrix Analysis. Cambridge Univ. Press, Cambridge.
• Jerrum, M. (1992). Large cliques elude the Metropolis process. Random Structures Algorithms 3 347–359.
• Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
• Journée, M., Nesterov, Y., Richtárik, P. and Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11 517–553.
• Juels, A. and Peinado, M. (2000). Hiding cliques for cryptographic security. Des. Codes Cryptogr. 20 269–280.
• Karp, R. M. (1972). Reducibility among combinatorial problems. In Complexity of Computer Computations (R. M. Miller et al., eds.) 85–103. Plenum, New York.
• Kučera, L. (1995). Expected complexity of graph partitioning problems. Discrete Appl. Math. 57 193–212.
• Lanczos, C. (1950). An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand. 45 255–282.
• Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
• Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
• Ma, Z. and Wu, Y. (2015). Computational barriers in minimax submatrix detection. Ann. Statist. 43 1089–1116.
• Majumdar, A. (2009). Image compression by sparse PCA coding in curvelet domain. Signal Image Video Process. 3 27–34.
• Naikal, N., Yang, A. Y. and Sastry, S. S. (2011). Informative feature selection for object recognition via sparse PCA. In Computer Vision (ICCV), 2011 IEEE International Conference 818–825. IEEE, Barcelona, Spain.
• Nemirovski, A. (2004). Prox-method with rate of convergence $O(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15 229–251 (electronic).
• Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Math. Program. 103 127–152.
• Parkhomenko, E., Tritchler, D. and Beyene, J. (2009). Sparse canonical correlation analysis with application to genomic data integration. Stat. Appl. Genet. Mol. Biol. 8 Art. 1, 36.
• Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
• Samworth, R. J. (2016). Peter Hall’s work on high-dimensional data and classification. Ann. Statist. To appear.
• Shen, D., Shen, H. and Marron, J. S. (2013). Consistency of sparse PCA in high dimension, low sample size contexts. J. Multivariate Anal. 115 317–333.
• Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York.
• Tan, K. M., Petersen, A. and Witten, D. (2014). Classification of RNA-seq data. In Statistical Analysis of Next Generation Sequencing Data (S. Datta and D. Witten, eds.) 219–246. Springer, Cham.
• van de Geer, S. (2000). Empirical Processes in $M$-Estimation. Cambridge Univ. Press, Cambridge.
• Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905–2947.
• Vu, V. Q., Cho, J., Lei, J. and Rohe, K. (2013). Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. Advances in Neural Information Processing Systems (NIPS) 26 2670–2678.
• Wang, T., Berthet, Q. and Samworth, R. J. (2015). Supplement to “Statistical and computational trade-offs in estimation of sparse principal components”. DOI:10.1214/15-AOS1369SUPP.
• Wang, Z., Lu, H. and Liu, H. (2014). Tighten after relax: Minimax-optimal sparse PCA in polynomial time. Advances in Neural Information Processing Systems (NIPS) 27 3383–3391.
• Wang, D., Lu, H. and Yang, M.-H. (2013). Online object tracking with sparse prototypes. IEEE Trans. Image Process. 22 314–325.
• Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
• Yu, Y., Wang, T. and Samworth, R. J. (2015). A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102 315–323.
• Yuan, X.-T. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14 899–925.
• Zhang, Y., Wainwright, M. J. and Jordan, M. I. (2014). Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. J. Mach. Learn. Res. W&CP 35 921–948.
• Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.

Supplemental materials

• Supplementary material to “Statistical and computational trade-offs in estimation of sparse principal components”. Ancillary results and a brief introduction to computational complexity theory.