## The Annals of Statistics

### Computational and statistical boundaries for submatrix localization in a large noisy matrix

#### Abstract

We study in this paper computational and statistical boundaries for submatrix localization. Given one observation of (one or multiple nonoverlapping) signal submatrix (of magnitude $\lambda$ and size $k_{m}\times k_{n}$) embedded in a large noise matrix (of size $m\times n$), the goal is to optimal identify the support of the signal submatrix computationally and statistically.

Two transition thresholds for the signal-to-noise ratio $\lambda/\sigma$ are established in terms of $m$, $n$, $k_{m}$ and $k_{n}$. The first threshold, $\sf SNR_{c}$, corresponds to the computational boundary. We introduce a new linear time spectral algorithm that identifies the submatrix with high probability when the signal strength is above the threshold $\sf SNR_{c}$. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the hidden clique hypothesis. The second threshold, $\sf SNR_{s}$, captures the statistical boundary, below which no method can succeed in localization with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. In marked contrast to submatrix detection and sparse PCA, the results show an interesting phenomenon that $\sf SNR_{c}$ is always significantly larger than $\sf SNR_{s}$ under the sub-Gaussian error model, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.

#### Article information

Source
Ann. Statist., Volume 45, Number 4 (2017), 1403-1430.

Dates
Revised: April 2016
First available in Project Euclid: 28 June 2017

https://projecteuclid.org/euclid.aos/1498636861

Digital Object Identifier
doi:10.1214/16-AOS1488

Mathematical Reviews number (MathSciNet)
MR3670183

Zentralblatt MATH identifier
06773278

Subjects
Primary: 62C20: Minimax procedures
Secondary: 90C27: Combinatorial optimization

#### Citation

Cai, T. Tony; Liang, Tengyuan; Rakhlin, Alexander. Computational and statistical boundaries for submatrix localization in a large noisy matrix. Ann. Statist. 45 (2017), no. 4, 1403--1430. doi:10.1214/16-AOS1488. https://projecteuclid.org/euclid.aos/1498636861

#### References

• [1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Ann. Statist. 40 1171–1197.
• [2] Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
• [3] Balakrishnan, S., Kolar, M., Rinaldo, A., Singh, A. and Wasserman, L. (2011). Statistical and computational tradeoffs in biclustering. In NIPS 2011 Workshop on Computational Trade-Offs in Statistical Learning.
• [4] Bennett, G. (1962). Probability inequalities for the sum of independent random variables. J. Amer. Statist. Assoc. 57 33–45.
• [5] Berthet, Q. and Rigollet, P. (2013). Computational lower bounds for sparse PCA. Preprint. Available at arXiv:1304.0828.
• [6] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• [7] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
• [8] Butucea, C. and Ingster, Y. I. (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 2652–2688.
• [9] Butucea, C., Ingster, Y. I. and Suslina, I. (2013). Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. Preprint. Available at arXiv:1303.5647.
• [10] Cai, T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• [11] Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781–815.
• [12] Cai, T., Ma, Z. and Wu, Y. (2017). Supplement to “Computational and statistical boundaries for submatrix localization in a large noisy matrix.” DOI:10.1214/16-AOS1488SUPP.
• [13] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37.
• [14] Chandrasekaran, V. and Jordan, M. I. (2013). Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. USA 110 E1181–E1190.
• [15] Chandrasekaran, V., Recht, B., Parrilo, P. A. and Willsky, A. S. (2012). The convex geometry of linear inverse problems. Found. Comput. Math. 12 805–849.
• [16] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2009). Sparse and low-rank matrix decompositions. In 47th Annual Allerton Conference on Communication, Control, and Computing 962–967. IEEE, Allerton, IL.
• [17] Chen, Y. and Xu, J. (2016). Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. J. Mach. Learn. Res. 17 Paper No. 27, 57.
• [18] Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
• [19] Deshpande, Y. and Montanari, A. (2015). Finding hidden cliques of size $\sqrt{N/e}$ in nearly linear time. Found. Comput. Math. 15 1069–1128.
• [20] Donoho, D. and Gavish, M. (2014). Minimax risk of matrix denoising by singular value thresholding. Ann. Statist. 42 2413–2440.
• [21] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• [22] Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879–921.
• [23] Drineas, P., Kannan, R. and Mahoney, M. W. (2006). Fast Monte Carlo algorithms for matrices. II. Computing a low-rank approximation to a matrix. SIAM J. Comput. 36 158–183.
• [24] Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S. S. and Xiao, Y. (2013). Statistical algorithms and a lower bound for detecting planted cliques. In STOC’13—Proceedings of the 2013 ACM Symposium on Theory of Computing 655–664. ACM, New York.
• [25] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
• [26] Hsu, D., Kakade, S. M. and Zhang, T. (2012). A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17 no. 52, 6.
• [27] Javanmard, A., Montanari, A. and Ricci-Tersenghi, F. (2015). Phase transitions in semidefinite relaxations. Preprint. Available at arXiv:1511.08769.
• [28] Johnstone, I. M. (2013). Gaussian estimation: Sequence and wavelet models. Unpublished manuscript.
• [29] Kolar, M., Balakrishnan, S., Rinaldo, A. and Singh, A. (2011). Minimax localization of structural information in large noisy matrices. In Advances in Neural Information Processing Systems 909–917.
• [30] Latała, R. (2005). Some estimates of norms of random matrices. Proc. Amer. Math. Soc. 133 1273–1282 (electronic).
• [31] Ma, Z. and Wu, Y. (2015). Computational barriers in minimax submatrix detection. Ann. Statist. 43 1089–1116.
• [32] McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529–537. IEEE Computer Soc., Los Alamitos, CA.
• [33] Montanari, A. and Richard, E. (2016). Non-negative principal component analysis: Message passing algorithms and sharp asymptotics. IEEE Trans. Inform. Theory 62 1458–1484.
• [34] Ng, A. Y., Jordan, M. I., Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2 849–856.
• [35] Shabalin, A. A., Weigman, V. J., Perou, C. M. and Nobel, A. B. (2009). Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 3 985–1012.
• [36] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation, Vol. 11. Springer, New York.
• [37] Tsybakov, A. B. (2014). Aggregation and minimax optimality in high-dimensional estimation.
• [38] Tufts, D. W. and Shah, A. A. (1993). Estimation of a signal waveform from noisy data using low-rank approximation to a data matrix. IEEE Trans. Signal Process. 41 1716–1721.
• [39] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• [40] Vu, V. (2008). Random discrete matrices. In Horizons of Combinatorics. Bolyai Soc. Math. Stud. 17 257–280. Springer, Berlin.
• [41] Vu, V. (2014). A simple SVD algorithm for finding hidden partitions. Preprint. Available at arXiv:1404.3918.
• [42] Vu, V. Q. and Lei, J. (2012). Minimax rates of estimation for sparse PCA in high dimensions. Preprint. Available at arXiv:1202.0786.
• [43] Wainwright, M. J. (2014). Structured regularizers for high-dimensional problems: Statistical and computational issues. Annual Review of Statistics and Its Application 1 233–253.
• [44] Wang, T., Berthet, Q. and Samworth, R. J. (2016). Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist. 44 1896–1930.
• [45] Zass, R. and Shashua, A. (2006). Nonnegative sparse PCA. In Advances in Neural Information Processing Systems 1561–1568.
• [46] Zhang, Y., Wainwright, M. J. and Jordan, M. I. (2014). Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. Preprint. Available at arXiv:1402.1918.

#### Supplemental materials

• Supplement to “Computational and statistical boundaries for submatrix localization in a large noisy matrix”. Due to space constraints, we have relegated remaining proofs to the supplement.