## The Annals of Statistics

### Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing

#### Abstract

Sample covariance matrices are widely used in multivariate statistical analysis. The central limit theorems (CLTs) for linear spectral statistics of high-dimensional noncentralized sample covariance matrices have received considerable attention in random matrix theory and have been applied to many high-dimensional statistical problems. However, known population mean vectors are assumed for noncentralized sample covariance matrices, some of which even assume Gaussian-like moment conditions. In fact, there are still another two most frequently used sample covariance matrices: the ME (moment estimator, constructed by subtracting the sample mean vector from each sample vector) and the unbiased sample covariance matrix (by changing the denominator $n$ as $N=n-1$ in the ME) without depending on unknown population mean vectors. In this paper, we not only establish the new CLTs for noncentralized sample covariance matrices when the Gaussian-like moment conditions do not hold but also characterize the nonnegligible differences among the CLTs for the three classes of high-dimensional sample covariance matrices by establishing a substitution principle: by substituting the adjusted sample size $N=n-1$ for the actual sample size $n$ in the centering term of the new CLTs, we obtain the CLT of the unbiased sample covariance matrices. Moreover, it is found that the difference between the CLTs for the ME and unbiased sample covariance matrix is nonnegligible in the centering term although the only difference between two sample covariance matrices is a normalization by $n$ and $n-1$, respectively. The new results are applied to two testing problems for high-dimensional covariance matrices.

#### Article information

Source
Ann. Statist., Volume 43, Number 2 (2015), 546-591.

Dates
First available in Project Euclid: 24 February 2015

https://projecteuclid.org/euclid.aos/1424787428

Digital Object Identifier
doi:10.1214/14-AOS1292

Mathematical Reviews number (MathSciNet)
MR3316190

Zentralblatt MATH identifier
1312.62074

#### Citation

Zheng, Shurong; Bai, Zhidong; Yao, Jianfeng. Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Ann. Statist. 43 (2015), no. 2, 546--591. doi:10.1214/14-AOS1292. https://projecteuclid.org/euclid.aos/1424787428

#### References

• [1] Bai, Z., Jiang, D., Yao, J.-F. and Zheng, S. (2009). Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Statist. 37 3822–3840.
• [2] Bai, Z., Jiang, D., Yao, J.-F. and Zheng, S. (2013). Testing linear hypotheses in high-dimensional regressions. Statistics 47 1207–1223.
• [3] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer, New York.
• [4] Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316–345.
• [5] Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. 32 553–605.
• [6] Bai, Z. D., Yin, Y. Q. and Krishnaiah, P. R. (1987). On limiting empirical distribution function of the eigenvalues of a multivariate $F$ matrix. Theory Probab. Appl. 32 537–548.
• [7] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• [8] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [9] Birke, M. and Dette, H. (2005). A note on testing the covariance matrix for large dimension. Statist. Probab. Lett. 74 281–289.
• [10] Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
• [11] Cai, T. T. and Ma, Z. (2013). Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli 19 2359–2388.
• [12] Chen, S. X., Zhang, L.-X. and Zhong, P.-S. (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819.
• [13] Johnstone, I. M. (2007). High dimensional statistical inference and random matrices. In International Congress of Mathematicians. Vol. I 307–333. Eur. Math. Soc., Zürich.
• [14] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081–1102.
• [15] Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist. 40 908–940.
• [16] Lytova, A. and Pastur, L. (2009). Central limit theorem for linear eigenvalue statistics of random matrices with independent entries. Ann. Probab. 37 1778–1840.
• [17] Pan, G. (2014). Comparison between two types of large sample covariance matrices. Ann. Inst. Henri Poincaré Probab. Stat. 50 655–677.
• [18] Pan, G. M. and Zhou, W. (2008). Central limit theorem for signal-to-interference ratio of reduced rank linear receiver. Ann. Appl. Probab. 18 1232–1270.
• [19] Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivariate Anal. 55 331–339.
• [20] Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high dimensional data. J. Japan Statist. Soc. 35 251–272.
• [21] Srivastava, M. S., Kollo, T. and von Rosen, D. (2011). Some tests for the covariance matrix with fewer observations than the dimension under non-normality. J. Multivariate Anal. 102 1090–1103.
• [22] Wang, Q. and Yao, J. (2013). On the sphericity test with large-dimensional observations. Electron. J. Stat. 7 2164–2192.
• [23] Zheng, S. (2012). Central limit theorems for linear spectral statistics of large dimensional $F$-matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48 444–476.