## Bernoulli

• Bernoulli
• Volume 19, Number 5B (2013), 2652-2688.

### Detection of a sparse submatrix of a high-dimensional noisy matrix

#### Abstract

We observe a $N\times M$ matrix $Y_{ij}=s_{ij}+\xi_{ij}$ with $\xi_{ij}\sim{ \mathcal {N}}(0,1)$ i.i.d. in $i,j$, and $s_{ij}\in\mathbb{R}$. We test the null hypothesis $s_{ij}=0$ for all $i,j$ against the alternative that there exists some submatrix of size $n\times m$ with significant elements in the sense that $s_{ij}\ge a>0$. We propose a test procedure and compute the asymptotical detection boundary $a$ so that the maximal testing risk tends to $0$ as $M\to\infty$, $N\to\infty$, $p=n/N\to0$, $q=m/M\to0$. We prove that this boundary is asymptotically sharp minimax under some additional constraints. Relations with other testing problems are discussed. We propose a testing procedure which adapts to unknown $(n,m)$ within some given set and compute the adaptive sharp rates. The implementation of our test procedure on synthetic data shows excellent behavior for sparse, not necessarily squared matrices. We extend our sharp minimax results in different directions: first, to Gaussian matrices with unknown variance, next, to matrices of random variables having a distribution from an exponential family (non-Gaussian) and, finally, to a two-sided alternative for matrices with Gaussian elements.

#### Article information

Source
Bernoulli, Volume 19, Number 5B (2013), 2652-2688.

Dates
First available in Project Euclid: 3 December 2013

https://projecteuclid.org/euclid.bj/1386078616

Digital Object Identifier
doi:10.3150/12-BEJ470

Mathematical Reviews number (MathSciNet)
MR3160567

Zentralblatt MATH identifier
06254575

#### Citation

Butucea, Cristina; Ingster, Yuri I. Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 (2013), no. 5B, 2652--2688. doi:10.3150/12-BEJ470. https://projecteuclid.org/euclid.bj/1386078616

#### References

• [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
• [2] Arias-Castro, E., Candès, E.J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
• [3] Arias-Castro, E., Candès, E.J. and Plan, Y. (2010). Global testing and sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Available at arXiv:1007.1434.
• [4] Arias-Castro, E., Donoho, D.L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
• [5] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [6] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• [7] Ingster, Y.I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
• [8] Ingster, Y.I. and Suslina, I.A. (2002). On a detection of a signal of known shape in multichannel system. Zapiski Nauchn. Sem. POMI 294 88–112 (in Russian) (transl. J. Math. Sci. 127 1723–1736).
• [9] Koltchinskii, V., Lounici, K. and Tsybakov, A.B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• [10] Shabalin, A.A., Weigman, V.J., Perou, C.M. and Nobel, A.B. (2009). Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 3 985–1012.
• [11] Sun, X. and Nobel, A.B. (2010). On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix. Available at arXiv:1009.0562v1.