The Annals of Statistics

Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case

Abstract

Consider a Gaussian vector $\mathbf{z}=(\mathbf{x}',\mathbf{y}')'$, consisting of two sub-vectors $\mathbf{x}$ and $\mathbf{y}$ with dimensions $p$ and $q$, respectively. With $n$ independent observations of $\mathbf{z}$, we study the correlation between $\mathbf{x}$ and $\mathbf{y}$, from the perspective of the canonical correlation analysis. We investigate the high-dimensional case: both $p$ and $q$ are proportional to the sample size $n$. Denote by $\Sigma_{uv}$ the population cross-covariance matrix of random vectors $\mathbf{u}$ and $\mathbf{v}$, and denote by $S_{uv}$ the sample counterpart. The canonical correlation coefficients between $\mathbf{x}$ and $\mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $\Sigma_{xx}^{-1}\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx}$. In this paper, we focus on the case that $\Sigma_{xy}$ is of finite rank $k$, that is, there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_{1}\geq\cdots\geq r_{k}>0$. We study the sample counterparts of $r_{i},i=1,\ldots,k$, that is, the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{xx}^{-1}S_{xy}S_{yy}^{-1}S_{yx}$, denoted by $\lambda_{1}\geq\cdots\geq\lambda_{k}$. We show that there exists a threshold $r_{c}\in(0,1)$, such that for each $i\in\{1,\ldots,k\}$, when $r_{i}\leq r_{c}$, $\lambda_{i}$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_{i}>r_{c}$, $\lambda_{i}$ possesses an almost sure limit in $(d_{+},1]$, from which we can recover $r_{i}$’s in turn, thus provide an estimate of the latter in the high-dimensional scenario. We also obtain the limiting distribution of $\lambda_{i}$’s under appropriate normalization. Specifically, $\lambda_{i}$ possesses Gaussian type fluctuation if $r_{i}>r_{c}$, and follows Tracy–Widom distribution if $r_{i}<r_{c}$. Some applications of our results are also discussed.

Article information

Source
Ann. Statist., Volume 47, Number 1 (2019), 612-640.

Dates
Revised: March 2018
First available in Project Euclid: 30 November 2018

https://projecteuclid.org/euclid.aos/1543568600

Digital Object Identifier
doi:10.1214/18-AOS1704

Mathematical Reviews number (MathSciNet)
MR3909944

Zentralblatt MATH identifier
07036213

Citation

Bao, Zhigang; Hu, Jiang; Pan, Guangming; Zhou, Wang. Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case. Ann. Statist. 47 (2019), no. 1, 612--640. doi:10.1214/18-AOS1704. https://projecteuclid.org/euclid.aos/1543568600

References

• [1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ.
• [2] Bai, Z., Choi, K. P. and Fujikoshi, Y. (2018). Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis. Ann. Statist. 46 1050–1076.
• [3] Bai, Z., Hu, J., Pan, G. and Zhou, W. (2015). Convergence of the empirical spectral distribution function of Beta matrices. Bernoulli 21 1538–1574.
• [4] Bai, Z. and Yao, J. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat. 44 447–474.
• [5] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
• [6] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
• [7] Bao, Z., Hu, J., Pan, G. and Zhou, W. (2017). Test of independence for high-dimensional random vectors based on freeness in block correlation matrices. Electron. J. Stat. 11 1527–1548.
• [8] Bao, Z. and Hu, J. (2018) High-dimensional CCA with general population. (In preparation).
• [9] Bao, Z., Hu, J., Pan, G. and Zhou, W. (2019). Supplement to “Canonical correlation coefficients of high-dimensional Gaussian vectors: finite rank case.” DOI:10.1214/18-AOS1704SUPP.
• [10] Belinschi, S. T., Bercovici, H., Capitaine, M. and Février, M. (2017). Outliers in the spectrum of large deformed unitarily invariant models. Ann. Probab. 45 3571–3625.
• [11] Benaych-Georges, F., Guionnet, A. and Maida, M. (2011). Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices. Electron. J. Probab. 16 1621–1662.
• [12] Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521.
• [13] Bretherton, C. S., Smith, C. and Wallace, J. M. (1992). An intercomparison of methods for finding coupled patterns in climate data. J. Climate 5 541–560.
• [14] Capitaine, M., Donati-Martin, C. and Féral, D. (2009). The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann. Probab. 37 1–47.
• [15] Capitaine, M., Donati-Martin, C. and Féral, D. (2012). Central limit theorems for eigenvalues of deformations of Wigner matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48 107–133.
• [16] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices and Banach spaces. In Handbook of the Geometry of Banach Spaces, Vol. I 317–366. North-Holland, Amsterdam.
• [17] Dutilleul, P., Pelletier, B. and Alpargu, G. (2008). Modified $F$ tests for assessing the multiple correlation between one spatial process and several others. J. Statist. Plann. Inference 138 1402–1415.
• [18] Edelman, A. and Rao, N. R. (2005). Random matrix theory. Acta Numer. 14 233–297.
• [19] Féral, D. and Péché, S. (2007). The largest eigenvalue of rank one deformation of large Wigner matrices. Comm. Math. Phys. 272 185–228.
• [20] Féral, D. and Péché, S. (2009). The largest eigenvalues of sample covariance matrices for a spiked population: Diagonal case. J. Math. Phys. 50 073302, 33.
• [21] Fujikoshi, Y. (2016). High-Dimensional Asymptotic Distributions of Characteristic Roots in multivariate linear models and canonical correlation analysis. Technical report.
• [22] Fujikoshi, Y. and Sakurai, T. (2016). High-dimensional consistency of rank estimation criteria in multivariate linear model. J. Multivariate Anal. 149 199–212.
• [23] Gao, C., Ma, Z., Ren, Z. and Zhou, H. H. (2015). Minimax estimation in sparse canonical correlation analysis. Ann. Statist. 43 2168–2197.
• [24] Gao, C., Ma, Z. and Zhou, H. H. (2017). Sparse CCA: Adaptive estimation and computational barriers. Ann. Statist. 45 2074–2101.
• [25] Gittins, R. (1985). Canonical Analysis. Biomathematics 12. Springer, Berlin, Heidelberg.
• [26] Han, X., Pan, G. and Yang, Q. (2018). A unified matrix model including both CCA and F matrices in multivariate analysis: The largest eigenvalue and its applications. Bernoulli 24 3447–3468.
• [27] Han, X., Pan, G. and Zhang, B. (2016). The Tracy–Widom law for the largest eigenvalue of F type matrices. Ann. Statist. 44 1564–1592.
• [28] Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28 321–377.
• [29] Hyodo, M., Shutoh, N., Nishiyama, T. and Pavlenko, T. (2015). Testing block-diagonal covariance structure for high-dimensional data. Stat. Neerl. 69 460–482.
• [30] Jiang, D., Bai, Z. and Zheng, S. (2013). Testing the independence of sets of large-dimensional variables. Sci. China Math. 56 135–147.
• [31] Jiang, T. and Yang, F. (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Statist. 41 2029–2074.
• [32] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• [33] Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716.
• [34] Johnstone, I. M. (2009). Approximate null distribution of the largest root in multivariate analysis. Ann. Appl. Stat. 3 1616–1633.
• [35] Johnstone, I. M. and Onatski, A. (2015). Testing in high-dimensional spiked models. arXiv:1509.07269.
• [36] Kargin, V. (2015). Subordination for the sum of two random matrices. Ann. Probab. 43 2119–2150.
• [37] Katz-Moses, B. (2012). Small Deviations for the Beta–Jacobi Ensemble. ProQuest LLC, Ann Arbor, MI. Thesis (Ph.D.)–University of Colorado at Boulder.
• [38] Knowles, A. and Yin, J. (2013). The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure Appl. Math. 66 1663–1750.
• [39] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. Wiley, New York.
• [40] Oda, R., Yanagihara, H. and Fujikoshi, Y. (2016). Asymptotic non-null distributions of test statistics for redundancy in the high-dimensional canonical correlation analysis Technical report.
• [41] Passemier, D. and Yao, J.-F. (2012). On determining the number of spikes in a high-dimensional spiked population model. Random Matrices Theory Appl. 1 1150002, 19.
• [42] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
• [43] Péché, S. (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probab. Theory Related Fields 134 127–173.
• [44] Rencher, A. C. and Pun, F. C. (1980). Inflation of $R^{2}$ in best subset regression. Technometrics 22 49.
• [45] Wachter, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. Ann. Statist. 8 937–957.
• [46] Wang, Q. and Yao, J. (2017). Extreme eigenvalues of large-dimensional spiked Fisher matrices with application. Ann. Statist. 45 415–460.
• [47] Yamada, Y., Hyodo, M. and Nishiyama, T. (2017). Testing block-diagonal covariance structure for high-dimensional data under non-normality. J. Multivariate Anal. 155 305–316.
• [48] Yang, Y. and Pan, G. (2012). The convergence of the empirical distribution of canonical correlation coefficients. Electron. J. Probab. 17 no. 64, 13.
• [49] Yang, Y. and Pan, G. (2015). Independence test for high dimensional data based on regularized canonical correlation coefficients. Ann. Statist. 43 467–500.
• [50] Zheng, S., Jiang, D., Bai, Z. and He, X. (2014). Inference on multiple correlation coefficients with moderately high dimensional data. Biometrika 101 748–754.

Supplemental materials

• Supplement to “Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case”. In this supplementary material, we present some simulation results and prove Theorem 2.1 and 2.3, Lemmas 6.1–6.3, 7.3–7.4, and also Proposition 7.1.