Annales de l'Institut Henri Poincaré, Probabilités et Statistiques

Sharp detection of smooth signals in a high-dimensional sparse matrix with indirect observations

Cristina Butucea and Ghislaine Gayraud

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We consider a matrix-valued Gaussian sequence model, that is, we observe a sequence of high-dimensional $M\times N$ matrices of heterogeneous Gaussian random variables $x_{ij,k}$ for $i\in\{1,\ldots,M\}$, $j\in\{1,\ldots,N\}$, $M$ and $N$ tend to infinity and $k\in\mathbb{Z}$. For large $|k|$, the standard deviation of our observations is $\epsilon|k|^{s}$ for some $\epsilon>0,\epsilon\to0$ and a given $s\geq0$, case that encompasses mildly ill-posed inverse problems.

We give separation rates for the detection of a sparse submatrix of size $m\times n$ ($m$ and $n$ tend to infinity, $m/M$ and $n/N$ tend 0) with active components. A component $(i,j)$ is said active if the sequence $\{x_{ij,k}\}_{k}$ has mean $\{\theta_{ij,k}\}_{k}$ within a Sobolev ellipsoid of smoothness $\tau>0$ and total energy $\sum_{k}\theta^{2}_{ij,k}$ larger than some $r^{2}_{\epsilon}$. We construct a test procedure and compute rates that involve relationships between $m,n,M$, $N$ and $\epsilon$, such that the asymptotic total error probability tends to 0. We also show how these rates can be made adaptive to the size $(m,n)$ of the submatrix under some constraints.

We prove corresponding lower bounds under additional assumptions on the relative size of the submatrix in the large matrix of observations. Our separation rates are sharp under further assumptions. Lower bounds for hypothesis testing problems mean that no test procedure can distinguish between the null hypothesis (no signal) and the alternative, i.e. the minimax total error probability for testing tends to 1.

Résumé

Nous considérons un modèle de suites de matrices de taille $M\times N$ dont les entrées sont des variables aléatoires Gaussiennes hétérogènes, $x_{ij,k}$, $i\in\{1,\ldots,M\}$, $j\in\{1,\ldots,N\}$, avec $M$ et $N$ qui tendent vers l’infini et $k\in\mathbb{Z}$. Pour $|k|$ grand, nous supposons l’écart-type de $x_{ij,k}$ de l’ordre de $\epsilon|k|^{s}$ pour $\epsilon>0$ tel que $\epsilon\rightarrow0$ et avec $s>0$ connu; notre modèle permet donc d’inclure le cadre des problèmes inverses modérément mal-posés.

Nos résultats sont des vitesses de séparation dans le problème de détection d’une sous-matrice significative de taille $m\times n$, avec $m$ et $n$ qui tendent vers l’infini et parcimonieuse, c-à-d $m/M$ et $n/N$ tendent vers $0$. Une composante $(i,j)$ est dite active si la suite $\{x_{ij,k}\}_{k}$ a une espérance $\{\theta_{ij,k}\}_{k}$ qui appartient à une ellipsoide de Sobolev de régularité $\tau>0$ et une énergie totale $\sum_{k}\theta^{2}_{ij,k}$ supérieure à $r^{2}_{\epsilon}$. Nous construisons une procédure de test pour laquelle nous obtenons des vitesses de séparation impliquant des relations entre $m,n,M$, $N$ et $\epsilon$, de sorte que l’erreur totale de test tende vers $0$. Nous montrons comment rendre ces vitesses de tests adaptatives en $(m,n)$, la taille des sous-matrices significatives.

En faisant une hypothèse supplémentaire sur la taille relative des sous-matrices à détecter, nous prouvons les bornes inférieures correspondantes, ce qui assure qu’aucune procédure de test n’est capable de distinguer l’hypothèse nulle de l’alternative avec des vitesses « meilleures » que celles obtenues par notre procédure de test. Dans certains cas, nous obtenons des vitesses de séparation exactes.

Article information

Source
Ann. Inst. H. Poincaré Probab. Statist., Volume 52, Number 4 (2016), 1564-1591.

Dates
Received: 5 May 2014
Revised: 21 March 2015
Accepted: 21 May 2015
First available in Project Euclid: 17 November 2016

Permanent link to this document
https://projecteuclid.org/euclid.aihp/1479373240

Digital Object Identifier
doi:10.1214/15-AIHP689

Mathematical Reviews number (MathSciNet)
MR3573287

Zentralblatt MATH identifier
1353.62056

Subjects
Primary: 62H15: Hypothesis testing 60G15: Gaussian processes 62G10: Hypothesis testing 62G20: Asymptotic properties 62C20: Minimax procedures

Keywords
Asymptotic minimax test Detection boundary Heterogeneous observations Gaussian white noise model High-dimensional data Indirect observations Inverse problems Sharp rates Sparsity

Citation

Butucea, Cristina; Gayraud, Ghislaine. Sharp detection of smooth signals in a high-dimensional sparse matrix with indirect observations. Ann. Inst. H. Poincaré Probab. Statist. 52 (2016), no. 4, 1564--1591. doi:10.1214/15-AIHP689. https://projecteuclid.org/euclid.aihp/1479373240


Export citation

References

  • [1] D. Aldous. Exchangeability and related topics. In École d’Été de Probabilités Saint-Flour XIII 1–198. Lecture Notes in Mathematics 1117. Springer, Berlin, 1985.
  • [2] E. Arias-Castro, E. J. Candès and A. Durand. Detection of an anomalous cluster in a network. Ann. Statist. 39 (1) (2011) 278–304.
  • [3] C. Butucea. Goodness-of-fit testing and quadratic functional estimation from indirect observations. Ann. Statist. 35 (2007) 1907–1930.
  • [4] C. Butucea and Y. I. Ingster. Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 (2013) 2652–2688.
  • [5] L. Cavalier, G. K. Golubev, D. Picard and A. B. Tsybakov. Oracle inequalities for inverse problems. Dedicated to the memory of Lucien Le Cam. Ann. Statist. 30 (2002) 843–874.
  • [6] D. Donoho and J. Jin. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 (2004) 962–994.
  • [7] G. Gayraud and Y. I. Ingster. Detection of sparse additive variable functions. Electron. J. Stat. 6 (2012) 1409–1448.
  • [8] Y. I. Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives. I. Math. Methods Statist. 2 (1993) 85–114.
  • [9] Yu. I. Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives. II. Math. Methods of Statist. 2 (1993) 171–189.
  • [10] Yu. I. Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives. III. Math. Methods of Statist. 2 (1993) 249–268.
  • [11] Y. I. Ingster. Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 (1997) 47–69.
  • [12] Y. I. Ingster, T. Sapatinas and I. A. Suslina. Minimax nonparametric testing in a problem related to the Radon transform. Math. Methods Statist. 20 (2011) 347–364.
  • [13] Y. I. Ingster, T. Sapatinas and I. A. Suslina. Minimax signal detection in ill-posed inverse problems. Ann. Statist. 40 (2012) 1524–1549.
  • [14] Y. I. Ingster and N. Stepanova. Estimation and detection of functions from weighted tensor product spaces. Math. Methods Statist. 18 (2009) 310–340.
  • [15] Y. I. Ingster and N. Stepanova. Estimation and detection of functions from anisotropic Sobolev classes. Electron. J. Stat. 5 (2011) 484–506.
  • [16] Y. I. Ingster and I. A. Suslina. On a detection of a signal of known shape in multichannel system. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 294 (2002) 88–112. Transl. J. Math. Sci. 127 (2005) 1723–1736.
  • [17] Y. I. Ingster and I. A. Suslina. Nonparametric Goodness-of-Fit Testing Under Gaussian Models. Lectures Notes in Statistics 169. Springer, New York, 2003.
  • [18] B. Laurent, J.-M. Loubes and C. Marteau. Signal detection for inverse problems in a multidimensional framework. Math. Methods Statist. 23 (2014) 279–305.
  • [19] B. Laurent, J.-M. Loubes and C. Marteau. Non asymptotic minimax rates of testing in signal detection with heterogeneous variances. Electron. J. Stat. 6 (2012) 91–122.
  • [20] P. Ravikumar, J. Lafferty, H. Liu and L. Wasserman. Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 (2009) 1009–1030.
  • [21] X. Sun and A. B. Nobel. On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix. Bernoulli 20 (2013) 275–294.
  • [22] A. A. Shabalin, V. J. Weigman, C. M. Perou and A. B. Nobel. Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 3 (2009) 985–1012.
  • [23] C. Stone. Additive regression and other nonparametric models. Ann. Statist. 13 (1985) 689–705.
  • [24] A. B. Tsybakov. Introduction to Nonparametric Statistics. Springer Series in Statistics. Springer, New York, 2009.
  • [25] G. Walther. Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010–1033.