Bernoulli

Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution

Abstract

Correlation matrices play a key role in many multivariate methods (e.g., graphical model estimation and factor analysis). The current state-of-the-art in estimating large correlation matrices focuses on the use of Pearson’s sample correlation matrix. Although Pearson’s sample correlation matrix enjoys various good properties under Gaussian models, it is not an effective estimator when facing heavy-tailed distributions. As a robust alternative, Han and Liu [J. Am. Stat. Assoc. 109 (2015) 275–287] advocated the use of a transformed version of the Kendall’s tau sample correlation matrix in estimating high dimensional latent generalized correlation matrix under the transelliptical distribution family (or elliptical copula). The transelliptical family assumes that after unspecified marginal monotone transformations, the data follow an elliptical distribution. In this paper, we study the theoretical properties of the Kendall’s tau sample correlation matrix and its transformed version proposed in Han and Liu [J. Am. Stat. Assoc. 109 (2015) 275–287] for estimating the population Kendall’s tau correlation matrix and the latent Pearson’s correlation matrix under both spectral and restricted spectral norms. With regard to the spectral norm, we highlight the role of “effective rank” in quantifying the rate of convergence. With regard to the restricted spectral norm, we for the first time present a “sign sub-Gaussian condition” which is sufficient to guarantee that the rank-based correlation matrix estimator attains the fast rate of convergence. In both cases, we do not need any moment condition.

Article information

Source
Bernoulli, Volume 23, Number 1 (2017), 23-57.

Dates
Revised: November 2014
First available in Project Euclid: 27 September 2016

https://projecteuclid.org/euclid.bj/1475001347

Digital Object Identifier
doi:10.3150/15-BEJ702

Mathematical Reviews number (MathSciNet)
MR3556765

Zentralblatt MATH identifier
1359.62186

Citation

Han, Fang; Liu, Han. Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution. Bernoulli 23 (2017), no. 1, 23--57. doi:10.3150/15-BEJ702. https://projecteuclid.org/euclid.bj/1475001347

References

• [1] Baik, J. and Silverstein, J.W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
• [2] Berthet, Q. and Rigollet, P. (2013). Computational lower bounds for sparse PCA. Preprint. Available at arXiv:1304.0828.
• [3] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• [4] Bickel, P.J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [5] Bickel, P.J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• [6] Boente, G., Barrerab, M.S. and Tylerc, D.E. (2012). A characterization of elliptical distributions and some optimality properties of principal components for functional data. Technical report. Available at http://www.stat.ubc.ca/~matias/Property_FPCA_rev1.pdf.
• [7] Bunea, F. and Xiao, L. (2015). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. Bernoulli 21 1200–1230.
• [8] Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781–815.
• [9] Cai, T.T., Zhang, C.-H. and Zhou, H.H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• [10] Cai, T.T. and Zhou, H.H. (2012). Minimax estimation of large covariance matrices under $\ell_{1}$-norm. Statist. Sinica 22 1319–1349.
• [11] Chung, F. and Lu, L. (2006). Complex Graphs and Networks. CBMS Regional Conference Series in Mathematics 107. Providence, RI: Amer. Math. Soc.
• [12] Embrechts, P., Lindskog, F. and McNeil, A. (2003). Modelling dependence with copulas and applications to risk management. Handbook of Heavy Tailed Distributions in Finance 8 329–384.
• [13] Fang, H.-B., Fang, K.-T. and Kotz, S. (2002). The meta-elliptical distributions with given marginals. J. Multivariate Anal. 82 1–16.
• [14] Fang, K.T., Kotz, S. and Ng, K.W. (1990). Symmetric Multivariate and Related Distributions. Monographs on Statistics and Applied Probability 36. London: Chapman & Hall.
• [15] Han, F. and Liu, H. (2013). Principal component analysis on non-Gaussian dependent data. J. Mach. Learn. Res. Workshop Conf. Proc. 28 240–248.
• [16] Han, F. and Liu, H. (2014). High dimensional semiparametric scale-invariant principal component analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36 2016–2032.
• [17] Han, F. and Liu, H. (2014). Scale-invariant sparse PCA on high dimensional meta-elliptical data. J. Am. Stat. Assoc. 109 275–287.
• [18] Han, F., Zhao, T. and Liu, H. (2013). CODA: High dimensional copula discriminant analysis. J. Mach. Learn. Res. 14 629–671.
• [19] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
• [20] Hogg, R.V. and Craig, A. (2012). Introduction to Mathematical Statistics, 7th ed. Upper Saddle River: Harlow, Essex.
• [21] Hubbard, J. (1959). Calculation of partition functions. Phys. Rev. Lett. 3 77.
• [22] Johnson, C.R., ed. (1990). Matrix Theory and Applications. Proceedings of Symposia in Applied Mathematics 40. Providence, RI: Amer. Math. Soc.
• [23] Johnstone, I.M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• [24] Jung, S. and Marron, J.S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist. 37 4104–4130.
• [25] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Providence, RI: Amer. Math. Soc.
• [26] Lindskog, F., McNeil, A. and Schmock, U. (2003). Kendall’s tau for elliptical distributions. Credit risk: Measurement, Evaluation and Management 149–156.
• [27] Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 2293–2326.
• [28] Liu, H., Han, F. and Zhang, C.-H. (2012). Transelliptical graphical models. In Proceedings of the Twenty-Fifth Annual Conference on Neural Information Processing Systems 809–817.
• [29] Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295–2328.
• [30] Lounici, K. (2014). High-dimensional covariance matrix estimation with missing observations. Bernoulli 20 1029–1058.
• [31] Tropp, J.A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
• [32] van de Geer, S. and Lederer, J. (2013). The Bernstein–Orlicz norm and deviation inequalities. Probab. Theory Related Fields 157 225–250.
• [33] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge: Cambridge Univ. Press.
• [34] Vu, V. and Lei, J. (2012). Minimax rates of estimation for sparse PCA in high dimensions. J. Mach. Learn. Res. Workshop Conf. Proc. 22 1278–1286.
• [35] Wegkamp, M. and Zhao, Y. (2013). Analysis of elliptical copula correlation factor model with Kendall’s tau. Personal communication.
• [36] Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40 2541–2571.
• [37] Yuan, X.-T. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14 899–925.