• Bernoulli
  • Volume 19, Number 1 (2013), 275-294.

On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix

Xing Sun and Andrew B. Nobel

Full-text: Open access


We investigate the maximal size of distinguished submatrices of a Gaussian random matrix. Of interest are submatrices whose entries have an average greater than or equal to a positive constant, and submatrices whose entries are well fit by a two-way ANOVA model. We identify size thresholds and associated (asymptotic) probability bounds for both large-average and ANOVA-fit submatrices. Probability bounds are obtained when the matrix and submatrices of interest are square and, in rectangular cases, when the matrix and submatrices of interest have fixed aspect ratios. Our principal result is an almost sure interval concentration result for the size of large average submatrices in the square case.

Article information

Bernoulli, Volume 19, Number 1 (2013), 275-294.

First available in Project Euclid: 18 January 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

analysis of variance data mining Gaussian random matrix large average submatrix random matrix theory second moment method


Sun, Xing; Nobel, Andrew B. On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix. Bernoulli 19 (2013), no. 1, 275--294. doi:10.3150/11-BEJ394.

Export citation


  • [1] Alon, N. and Naor, A. (2006). Approximating the cut-norm via Grothendieck’s inequality. SIAM J. Comput. 35 787–803 (electronic).
  • [2] Anderson, G.W., Guionnet, A. and Zeitouni, O. (2010). An Introduction to Random Matrices. Cambridge Studies in Advanced Mathematics 118. Cambridge: Cambridge Univ. Press.
  • [3] Bollobás, B. and Erdős, P. (1976). Cliques in random graphs. Math. Proc. Cambridge Philos. Soc. 80 419–427.
  • [4] Dawande, M., Keskinocak, P., Swaminathan, J.M. and Tayur, S. (2001). On bipartite and multipartite clique problems. J. Algorithms 41 388–403.
  • [5] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • [6] Koyuturk, M., Szpankowski, W. and Grama, A. (2004). Biclustering gene-feature matrices for statistically significant dense patterns. In Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference 480–484. IEEE Computer Society, Technical Committee on Bioinformatics.
  • [7] Matula, D. (1976). The largest clique size in a random graph. Technical Report CS 7608, Southern Methodist Univ.
  • [8] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [9] Scheffé, H. (1999). The Analysis of Variance. Wiley Classics Library. New York: Wiley.
  • [10] Sun, X. and Nobel, A. (2006). Significance and recovery of block structures in binary matrices with noise. In Proceedings of the 19th Conference on Learning Theory. Lecture Notes in Computer Science 4005 109–122. Berlin: Springer.
  • [11] Sun, X. and Nobel, A.B. (2008). On the size and recovery of submatrices of ones in a random binary matrix. J. Mach. Learn. Res. 9 2431–2453.