Bernoulli

  • Bernoulli
  • Volume 21, Number 1 (2015), 209-241.

Detecting positive correlations in a multivariate sample

Ery Arias-Castro, Sébastien Bubeck, and Gábor Lugosi

Full-text: Open access

Abstract

We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes and study the performance of some near-optimal tests. We pay special attention to computational feasibility and construct near-optimal tests that can be computed efficiently. Finally, we apply our results to prove new lower bounds for the clique number of high-dimensional random geometric graphs.

Article information

Source
Bernoulli, Volume 21, Number 1 (2015), 209-241.

Dates
First available in Project Euclid: 17 March 2015

Permanent link to this document
https://projecteuclid.org/euclid.bj/1426597068

Digital Object Identifier
doi:10.3150/13-BEJ565

Mathematical Reviews number (MathSciNet)
MR3322317

Zentralblatt MATH identifier
1359.62208

Keywords
Bayesian detection high-dimensional data minimax detection random geometric graphs sparse covariance matrix sparse detection

Citation

Arias-Castro, Ery; Bubeck, Sébastien; Lugosi, Gábor. Detecting positive correlations in a multivariate sample. Bernoulli 21 (2015), no. 1, 209--241. doi:10.3150/13-BEJ565. https://projecteuclid.org/euclid.bj/1426597068


Export citation

References

  • [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
  • [2] Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. Random Structures Algorithms 13 457–466.
  • [3] Arias-Castro, E., Bubeck, S. and Lugosi, G. (2012). Detection of correlations. Ann. Statist. 40 412–435.
  • [4] Arias-Castro, E., Candès, E.J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
  • [5] Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
  • [6] Berman, S.M. (1962). Equally correlated random variables. Sankhyā Ser. A 24 155–156.
  • [7] Berthet, Q. and Rigollet, P. (2013). Computational lower bounds for sparse pca. Preprint, available at arXiv:1304.0828.
  • [8] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
  • [9] Bickel, P.J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • [10] Bickel, P.J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [11] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities; A Nonasymptotic Theory of Independence. Oxford: Oxford Univ. Press.
  • [12] Butucea, C. and Ingster, Y.I. (2011). Detection of a sparse submatrix of a high-dimensional noisy matrix. Available at http://arxiv.org/abs/1109.0898.
  • [13] Cai, T.T., Zhang, C.-H. and Zhou, H.H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • [14] Chen, S.X., Zhang, L.-X. and Zhong, P.-S. (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819.
  • [15] d’Aspremont, A., El Ghaoui, L., Jordan, M.I. and Lanckriet, G.R.G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
  • [16] Dembo, A. and Zeitouni, O. (2010). Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability 38. Berlin: Springer. Corrected reprint of the second (1998) edition.
  • [17] Devroye, L., György, A., Lugosi, G. and Udina, F. (2011). High-dimensional random geometric graphs and their clique number. Electron. J. Probab. 16 2481–2508.
  • [18] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • [19] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • [20] Fisher, T.J. (2012). On testing for an identity covariance matrix when the dimensionality equals or exceeds the sample size. J. Statist. Plann. Inference 142 312–326.
  • [21] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [22] Ingster, Yu.I. (1998). Minimax detection of a signal for $l^{n}$-balls. Math. Methods Statist. 7 401–428.
  • [23] Jin, J. (2003). Detecting and estimating sparse mixtures. Ph.D. Thesis, Stanford Univ.
  • [24] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081–1102.
  • [25] Muirhead, R.J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. New York: Wiley.
  • [26] Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist. 1 700–709.
  • [27] Schott, J.R. (2005). Testing for complete independence in high dimensions. Biometrika 92 951–956.
  • [28] Srivastava, M.S. (2005). Some tests concerning the covariance matrix in high dimensional data. J. Japan Statist. Soc. 35 251–272.
  • [29] Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Inst. Hautes Études Sci. Publ. Math. 81 73–205.
  • [30] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38–90.
  • [31] Verzelen, N. and Villers, F. (2010). Goodness-of-fit tests for high-dimensional Gaussian linear models. Ann. Statist. 38 704–752.