The Annals of Statistics

The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics

Abstract

The singular value matrix decomposition plays a ubiquitous role throughout statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and exploiting the geometric structure of singular values and singular vectors.

This paper provides a novel collection of technical and theoretical tools for studying the geometry of singular subspaces using the two-to-infinity norm. Motivated by preliminary deterministic Procrustes analysis, we consider a general matrix perturbation setting in which we derive a new Procrustean matrix decomposition. Together with flexible machinery developed for the two-to-infinity norm, this allows us to conduct a refined analysis of the induced perturbation geometry with respect to the underlying singular vectors even in the presence of singular value multiplicity. Our analysis yields singular vector entrywise perturbation bounds for a range of popular matrix noise models, each of which has a meaningful associated statistical inference task. In addition, we demonstrate how the two-to-infinity norm is the preferred norm in certain statistical settings. Specific applications discussed in this paper include covariance estimation, singular subspace recovery, and multiple graph inference.

Both our Procrustean matrix decomposition and the technical machinery developed for the two-to-infinity norm may be of independent interest.

Article information

Source
Ann. Statist., Volume 47, Number 5 (2019), 2405-2439.

Dates
Revised: March 2018
First available in Project Euclid: 3 August 2019

https://projecteuclid.org/euclid.aos/1564797852

Digital Object Identifier
doi:10.1214/18-AOS1752

Mathematical Reviews number (MathSciNet)
MR3988761

Citation

Cape, Joshua; Tang, Minh; Priebe, Carey E. The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. Ann. Statist. 47 (2019), no. 5, 2405--2439. doi:10.1214/18-AOS1752. https://projecteuclid.org/euclid.aos/1564797852

References

• [1] Abbe, E., Fan, J., Wang, K. and Zhong, Y. (2017). Entrywise eigenvector analysis of random matrices with low expected rank. Preprint. Available at arXiv:1709.09565.
• [2] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. Springer, New York.
• [3] Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521.
• [4] Benidis, K., Sun, Y., Babu, P. and Palomar, D. P. (2016). Orthogonal sparse eigenvectors: A Procrustes problem. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4683–4686.
• [5] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
• [6] Bojanczyk, A. W. and Lutoborski, A. (1999). The Procrustes problem for orthogonal Stiefel matrices. SIAM J. Sci. Comput. 21 1291–1304.
• [7] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• [8] Cai, T. T. and Zhang, A. (2018). Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. Ann. Statist. 46 60–89.
• [9] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
• [10] Cape, J., Tang, M. and Priebe, C. E. (2018). Signal-plus-noise matrix models: Eigenvector deviations and fluctuations. Biometrika. To appear. Available at arXiv:1802.00381.
• [11] Chen, L., Vogelstein, J. T., Lyzinski, V. and Priebe, C. E. (2016). A joint graph inference case study: The C. elegans chemical and electrical connectome. Worm 5 1–8.
• [12] Chikuse, Y. (2003). Statistics on Special Manifolds. Lecture Notes in Statistics 174. Springer, New York.
• [13] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46.
• [14] Dryden, I. L., Koloydenko, A. and Zhou, D. (2009). Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann. Appl. Stat. 3 1102–1123.
• [15] Dryden, I. L. and Mardia, K. V. (2016). Statistical Shape Analysis with Applications in R, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Chichester.
• [16] Edelman, A., Arias, T. A. and Smith, S. T. (1999). The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20 303–353.
• [17] Eldar, Y. C. and Kutyniok, G. (2012). Compressed Sensing: Theory and Applications. Cambridge Univ. Press, Cambridge.
• [18] Eldridge, J., Belkin, M. and Wang, Y. (2018). Unperturbed: Spectral analysis beyond Davis-Kahan. In: Proceedings of Algorithmic Learning Theory. Proceedings of Machine Learning Research 83, PMLR 321–358.
• [19] Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 603–680.
• [20] Fan, J., Rigollet, P. and Wang, W. (2015). Estimation of functionals of sparse covariance matrices. Ann. Statist. 43 2706–2737.
• [21] Fan, J., Wang, W. and Zhong, Y. (2017). An $\ell_{\infty}$ eigenvector perturbation bound and its application to robust covariance estimation. J. Mach. Learn. Res. 18 1–42.
• [22] Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23–39.
• [23] Gower, J. C. and Dijksterhuis, G. B. (2004). Procrustes Problems. Oxford Statistical Science Series 30. Oxford Univ. Press, Oxford.
• [24] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
• [25] Horn, R. A. and Johnson, C. R. (2012). Matrix Analysis. Cambridge Univ. Press, Cambridge.
• [26] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• [27] Jolliffe, I. T. (1986). Principal Component Analysis. Springer Series in Statistics. Springer, New York.
• [28] Koltchinskii, V. and Lounici, K. (2017). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23 110–133.
• [29] Koltchinskii, V. and Lounici, K. (2017). New asymptotic results in principal component analysis. Sankhyā A 79 254–297.
• [30] Le, C. M., Levina, E. and Vershynin, R. (2016). Optimization via low-rank approximation for community detection in networks. Ann. Statist. 44 373–400.
• [31] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
• [32] Levin, K., Athreya, A., Tang, M., Lyzinski, V. and Priebe, C. E. (2017). A central limit theorem for an omnibus embedding of random dot product graphs. Preprint. Available at arXiv:1705.09355.
• [33] Lu, L. and Peng, X. (2013). Spectra of edge-independent random graphs. Electron. J. Combin. 20 1–18.
• [34] Lyzinski, V. (2018). Information recovery in shuffled graphs via graph matching. IEEE Trans. Inform. Theory 64 3254–3273.
• [35] Lyzinski, V., Park, Y., Priebe, C. E. and Trosset, M. (2017). Fast embedding for JOFC using the raw stress criterion. J. Comput. Graph. Statist. 26 786–802.
• [36] Lyzinski, V., Sussman, D. L., Fishkind, D. E., Pao, H., Chen, L., Vogelstein, J. T., Park, Y. and Priebe, C. E. (2015). Spectral clustering for divide-and-conquer graph matching. Parallel Comput. 47 70–87.
• [37] Lyzinski, V., Sussman, D. L., Tang, M., Athreya, A. and Priebe, C. E. (2014). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electron. J. Stat. 8 2905–2922.
• [38] Mao, X., Sarkar, P. and Chakrabarti, D. (2017). Estimating mixed memberships with sharp eigenvector deviations. Preprint. Available at arXiv:1709.00407.
• [39] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
• [40] O’Rourke, S., Vu, V. and Wang, K. (2016). Eigenvectors of random matrices: A survey. J. Combin. Theory Ser. A 144 361–442.
• [41] O’Rourke, S., Vu, V. and Wang, K. (2018). Random perturbation of low rank matrices: Improving classical bounds. Linear Algebra Appl. 540 26–59.
• [42] Paul, D. and Aue, A. (2014). Random matrix theory in statistics: A review. J. Statist. Plann. Inference 150 1–29.
• [43] Priebe, C. E., Marchette, D. J., Ma, Z. and Adali, S. (2013). Manifold matching: Joint optimization of fidelity and commensurability. Braz. J. Probab. Stat. 27 377–400.
• [44] Rebrova, E. and Vershynin, R. (2018). Norms of random matrices: Local and global problems. Adv. Math. 324 40–83.
• [45] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• [46] Rudelson, M. and Vershynin, R. (2015). Delocalization of eigenvectors of random matrices with independent entries. Duke Math. J. 164 2507–2538.
• [47] Sarkar, P. and Bickel, P. J. (2015). Role of normalization in spectral clustering for stochastic blockmodels. Ann. Statist. 43 962–990.
• [48] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Computer Science and Scientific Computing. Academic Press, Boston, MA.
• [49] Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119–1128.
• [50] Sussman, D. L., Tang, M. and Priebe, C. E. (2014). Consistent latent position estimation and vertex classification for random dot product graphs. IEEE Trans. Pattern Anal. Mach. Intell. 36 48–57.
• [51] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., Park, Y. and Priebe, C. E. (2017). A semiparametric two-sample hypothesis testing problem for random graphs. J. Comput. Graph. Statist. 26 344–354.
• [52] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V. and Priebe, C. E. (2017). A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli 23 1599–1630.
• [53] Tang, M., Cape, J. and Priebe, C. E. (2017). Asymptotically efficient estimators for stochastic blockmodels: The naive MLE, the rank-constrained MLE, and the spectral. Preprint. Available at arXiv:1710.10936.
• [54] Tang, M. and Priebe, C. E. (2018). Limit theorems for eigenvectors of the normalized Laplacian for random graphs. Ann. Statist. 46 2360–2415.
• [55] Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science. Online version 2018-02-09.
• [56] von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.
• [57] Wedin, P. (1972). Perturbation bounds in connection with singular value decomposition. BIT 12 99–111.
• [58] Weyl, H. (1912). Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math. Ann. 71 441–479.
• [59] Yao, J., Zheng, S. and Bai, Z. (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics 39. Cambridge Univ. Press, New York.
• [60] Young, S. J. and Scheinerman, E. R. (2007). Random dot product graph models for social networks. In Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science 4863 138–149. Springer, Berlin.
• [61] Yu, Y., Wang, T. and Samworth, R. J. (2015). A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102 315–323.