Open Access
February 2019 Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case
Zhigang Bao, Jiang Hu, Guangming Pan, Wang Zhou
Ann. Statist. 47(1): 612-640 (February 2019). DOI: 10.1214/18-AOS1704

Abstract

Consider a Gaussian vector $\mathbf{z}=(\mathbf{x}',\mathbf{y}')'$, consisting of two sub-vectors $\mathbf{x}$ and $\mathbf{y}$ with dimensions $p$ and $q$, respectively. With $n$ independent observations of $\mathbf{z}$, we study the correlation between $\mathbf{x}$ and $\mathbf{y}$, from the perspective of the canonical correlation analysis. We investigate the high-dimensional case: both $p$ and $q$ are proportional to the sample size $n$. Denote by $\Sigma_{uv}$ the population cross-covariance matrix of random vectors $\mathbf{u}$ and $\mathbf{v}$, and denote by $S_{uv}$ the sample counterpart. The canonical correlation coefficients between $\mathbf{x}$ and $\mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $\Sigma_{xx}^{-1}\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx}$. In this paper, we focus on the case that $\Sigma_{xy}$ is of finite rank $k$, that is, there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_{1}\geq\cdots\geq r_{k}>0$. We study the sample counterparts of $r_{i},i=1,\ldots,k$, that is, the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{xx}^{-1}S_{xy}S_{yy}^{-1}S_{yx}$, denoted by $\lambda_{1}\geq\cdots\geq\lambda_{k}$. We show that there exists a threshold $r_{c}\in(0,1)$, such that for each $i\in\{1,\ldots,k\}$, when $r_{i}\leq r_{c}$, $\lambda_{i}$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_{i}>r_{c}$, $\lambda_{i}$ possesses an almost sure limit in $(d_{+},1]$, from which we can recover $r_{i}$’s in turn, thus provide an estimate of the latter in the high-dimensional scenario. We also obtain the limiting distribution of $\lambda_{i}$’s under appropriate normalization. Specifically, $\lambda_{i}$ possesses Gaussian type fluctuation if $r_{i}>r_{c}$, and follows Tracy–Widom distribution if $r_{i}<r_{c}$. Some applications of our results are also discussed.

Citation

Download Citation

Zhigang Bao. Jiang Hu. Guangming Pan. Wang Zhou. "Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case." Ann. Statist. 47 (1) 612 - 640, February 2019. https://doi.org/10.1214/18-AOS1704

Information

Received: 1 June 2017; Revised: 1 March 2018; Published: February 2019
First available in Project Euclid: 30 November 2018

zbMATH: 07036213
MathSciNet: MR3909944
Digital Object Identifier: 10.1214/18-AOS1704

Subjects:
Primary: 60B20 , 60F99 , 62H20

Keywords: canonical correlation analysis , finite rank perturbation , High-dimensional data , largest eigenvalues , MANOVA ensemble , random matrices

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.47 • No. 1 • February 2019
Back to Top