## Bernoulli

• Bernoulli
• Volume 21, Number 1 (2015), 209-241.

### Detecting positive correlations in a multivariate sample

#### Abstract

We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes and study the performance of some near-optimal tests. We pay special attention to computational feasibility and construct near-optimal tests that can be computed efficiently. Finally, we apply our results to prove new lower bounds for the clique number of high-dimensional random geometric graphs.

#### Article information

Source
Bernoulli, Volume 21, Number 1 (2015), 209-241.

Dates
First available in Project Euclid: 17 March 2015

Permanent link to this document
https://projecteuclid.org/euclid.bj/1426597068

Digital Object Identifier
doi:10.3150/13-BEJ565

Mathematical Reviews number (MathSciNet)
MR3322317

Zentralblatt MATH identifier
1359.62208

#### Citation

Arias-Castro, Ery; Bubeck, Sébastien; Lugosi, Gábor. Detecting positive correlations in a multivariate sample. Bernoulli 21 (2015), no. 1, 209--241. doi:10.3150/13-BEJ565. https://projecteuclid.org/euclid.bj/1426597068

#### References

• [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
• [2] Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. Random Structures Algorithms 13 457–466.
• [3] Arias-Castro, E., Bubeck, S. and Lugosi, G. (2012). Detection of correlations. Ann. Statist. 40 412–435.
• [4] Arias-Castro, E., Candès, E.J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
• [5] Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
• [6] Berman, S.M. (1962). Equally correlated random variables. Sankhyā Ser. A 24 155–156.
• [7] Berthet, Q. and Rigollet, P. (2013). Computational lower bounds for sparse pca. Preprint, available at arXiv:1304.0828.
• [8] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• [9] Bickel, P.J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• [10] Bickel, P.J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [11] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities; A Nonasymptotic Theory of Independence. Oxford: Oxford Univ. Press.
• [12] Butucea, C. and Ingster, Y.I. (2011). Detection of a sparse submatrix of a high-dimensional noisy matrix. Available at http://arxiv.org/abs/1109.0898.
• [13] Cai, T.T., Zhang, C.-H. and Zhou, H.H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• [14] Chen, S.X., Zhang, L.-X. and Zhong, P.-S. (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819.
• [15] d’Aspremont, A., El Ghaoui, L., Jordan, M.I. and Lanckriet, G.R.G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
• [16] Dembo, A. and Zeitouni, O. (2010). Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability 38. Berlin: Springer. Corrected reprint of the second (1998) edition.
• [17] Devroye, L., György, A., Lugosi, G. and Udina, F. (2011). High-dimensional random geometric graphs and their clique number. Electron. J. Probab. 16 2481–2508.
• [18] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• [19] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
• [20] Fisher, T.J. (2012). On testing for an identity covariance matrix when the dimensionality equals or exceeds the sample size. J. Statist. Plann. Inference 142 312–326.
• [21] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
• [22] Ingster, Yu.I. (1998). Minimax detection of a signal for $l^{n}$-balls. Math. Methods Statist. 7 401–428.
• [23] Jin, J. (2003). Detecting and estimating sparse mixtures. Ph.D. Thesis, Stanford Univ.
• [24] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081–1102.
• [25] Muirhead, R.J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. New York: Wiley.
• [26] Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist. 1 700–709.
• [27] Schott, J.R. (2005). Testing for complete independence in high dimensions. Biometrika 92 951–956.
• [28] Srivastava, M.S. (2005). Some tests concerning the covariance matrix in high dimensional data. J. Japan Statist. Soc. 35 251–272.
• [29] Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Inst. Hautes Études Sci. Publ. Math. 81 73–205.
• [30] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38–90.
• [31] Verzelen, N. and Villers, F. (2010). Goodness-of-fit tests for high-dimensional Gaussian linear models. Ann. Statist. 38 704–752.