The Annals of Statistics

High-dimensional analysis of semidefinite relaxations for sparse principal components

Arash A. Amini and Martin J. Wainwright

Full-text: Open access

Abstract

Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n” setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size θdia(n, p, k)=n/[k2log(pk)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the rescaled sample size θsdp(n, p, k)=n/[klog(pk)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter θsdp(n, p, k) is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.

Article information

Source
Ann. Statist., Volume 37, Number 5B (2009), 2877-2921.

Dates
First available in Project Euclid: 17 July 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1247836672

Digital Object Identifier
doi:10.1214/08-AOS664

Mathematical Reviews number (MathSciNet)
MR2541450

Zentralblatt MATH identifier
1173.62049

Subjects
Primary: 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 62F12: Asymptotic properties of estimators

Keywords
Principal component analysis spectral analysis spiked covariance ensembles sparsity high-dimensional statistics convex relaxation semidefinite programming Wishart ensembles random matrices

Citation

Amini, Arash A.; Wainwright, Martin J. High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 (2009), no. 5B, 2877--2921. doi:10.1214/08-AOS664. https://projecteuclid.org/euclid.aos/1247836672


Export citation

References

  • [1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
  • [2] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked populations models. J. Multivariate Anal. 97 1382–1408.
  • [3] Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [4] Birgé, L. (2001). An alternative point of view on Lepski’s method. In State of the Art in Probability and Statistics. IMS Lecture Notes 37 113–133.
  • [5] Buldygin, V. V. and Kozachenko, Y. V. (2000). Metric Characterization of Random Variables and Random Processes. Amer. Math. Soc., Providence, RI.
  • [6] Candes, E. and Tao, T. (2006). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [7] Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, New York.
  • [8] d’Aspremont, A., Bannerjee, O. and El Ghaoui, L. (2007). First order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56–66.
  • [9] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448.
  • [10] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of Banach Spaces 1 317–336. Elsevier, Amsterdam, NL.
  • [11] Donoho, D. (2006). Compressed sensing. IEEE Trans. Info. Theory 52 1289–1306.
  • [12] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • [13] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • [14] Gradshteyn, I. S. and Ryzhik, I. M. (2000). Tables of Integrals, Series, and Products. Academic Press, New York, NY.
  • [15] Grimmett, G. R. and Stirzaker, D. R. (1992). Probability and Random Processes. Oxford Science Publications, Clarendon Press, Oxford.
  • [16] Has’minskii, R. Z. (1978). A lower bound on the risks of nonparametric estimates of densities in the uniform metric. Theory Probab. Appl. 23 794–798.
  • [17] Hiriart-Urruty, J. and Lemaréchal, C. (1993). Convex Analysis and Minimization Algorithms 1. Springer, New York.
  • [18] Horn, R. A. and Johnson, C. R. (1986). Matrix Analysis. Cambridge Univ. Press, New York, NY.
  • [19] Johnstone, I. M. (2001). Chi-square oracle inequalities. In State of the Art in Probability and Statistics IMS Lecture Notes 37 (M. de Gunst, C. Klaassen and A. van der Vaart, eds.) 399–418. Institute of Mathematical Statistics.
  • [20] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [21] Johnstone, I. M. and Lu, A. (2004). Sparse principal components. Technical report, Stanford Univ.
  • [22] Jolliffe, I. T. (2004). Principal Component Analysis. Springer, New York.
  • [23] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
  • [24] Laurent, B. and Massart, P. (1998). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1303–1338.
  • [25] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Amer. Math. Soc., Providence, RI.
  • [26] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, New York.
  • [27] Matousek, J. (2002). Lectures on Discrete Geometry. Springer, New York.
  • [28] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [29] Moghaddam, B., Weiss, Y. and Avidan, S. (2005). Spectral bounds for sparse PCA: Exact and greedy algorithms. In Neural Information Processing Systems (NIPS). Vancouver, Canada.
  • [30] Paul, D. (2005). Nonparametric estimation of principal components. Ph.D. thesis, Stanford Univ.
  • [31] Paul, D. (2007). Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [32] Paul, D. and Johnstone, I. (2008). Augmented sparse principal component analysis for high-dimensional data. Technical report, UC Davis.
  • [33] Rockafellar, G. (1970). Convex Analysis. Princeton Univ. Press, Princeton.
  • [34] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, New York.
  • [35] Tropp, J. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051.
  • [36] Wainwright, M. J. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity using the Lasso. In IEEE Trans. Inform. Theory. Technical Report 709, Dept. Statistics, UC Berkeley.
  • [37] Wainwright, M. J. (2007). Information-theoretic bounds for sparsity recovery in the high-dimensional and noisy setting. In Int. Symposium on Information Theory. Technical Report 725, Dept. Statistics, UC Berkeley.
  • [38] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599.
  • [39] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • [40] Zhang, Z., Zha, H. and Simon, H. (2002). Low-rank approximations with sparse factors I: Basic algorithms and error analysis. SIAM J. Matrix Anal. Appl. 23 706–727.
  • [41] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 262–286.