## The Annals of Statistics

### Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries

Stanislav Minsker

#### Abstract

Estimation of the covariance matrix has attracted a lot of attention of the statistical research community over the years, partially due to important applications such as principal component analysis. However, frequently used empirical covariance estimator, and its modifications, is very sensitive to the presence of outliers in the data. As P. Huber wrote [Ann. Math. Stat. 35 (1964) 73–101], “…This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance….” Motivated by Tukey’s question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We explain the key ideas behind our construction, and discuss applications to covariance estimation and matrix completion problems.

#### Article information

Source
Ann. Statist., Volume 46, Number 6A (2018), 2871-2903.

Dates
Revised: August 2017
First available in Project Euclid: 7 September 2018

https://projecteuclid.org/euclid.aos/1536307236

Digital Object Identifier
doi:10.1214/17-AOS1642

Mathematical Reviews number (MathSciNet)
MR3851758

Zentralblatt MATH identifier
06968602

#### Citation

Minsker, Stanislav. Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. Ann. Statist. 46 (2018), no. 6A, 2871--2903. doi:10.1214/17-AOS1642. https://projecteuclid.org/euclid.aos/1536307236

#### References

• [1] Ahlswede, R. and Winter, A. (2002). Strong converse for identification via quantum channels. IEEE Trans. Inform. Theory 48 569–579.
• [2] Aleksandrov, A. B. and Peller, V. V. (2016). Operator Lipschitz functions. Russian Math. Surveys 71 605.
• [3] Alon, N., Matias, Y. and Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing 20–29. ACM, New York.
• [4] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
• [5] Brownlees, C., Joly, E. and Lugosi, G. (2015). Empirical risk minimization for heavy-tailed losses. Ann. Statist. 43 2507–2536.
• [6] Butler, R. W., Davies, P. L. and Jhun, M. (1993). Asymptotics for the minimum covariance determinant estimator. Ann. Statist. 21 1385–1400.
• [7] Cai, T. T., Ren, Z. and Zhou, H. H. (2016). Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Stat. 10 1–59.
• [8] Cai, T. T., Zhang, C. H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• [9] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37.
• [10] Carlen, E. (2010). Trace inequalities and quantum entropy: An introductory course. Available at http://www.mathphys.org/AZschool/material/AZ09-carlen.pdf.
• [11] Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 48 1148–1185.
• [12] Catoni, O. (2016). PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design. Preprint. Available at arXiv:1603.05229.
• [13] Davies, L. (1992). The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Statist. 1828–1843.
• [14] Devroye, L., Lerasle, M., Lugosi, G. and Oliveira, R. I. (2015). Sub-Gaussian mean estimators. Preprint. Available at arXiv:1509.05845.
• [15] Fan, J., Wang, W. and Zhong, Y. (2016). An $\ell_{\infty}$ eigenvector perturbation bound and its application to robust covariance estimation. Preprint. Available at arXiv:1603.03516.
• [16] Fan, J., Wang, W. and Zhu, Z. (2016). Robust low-rank matrix recovery. Preprint. Available at arXiv:1603.08315.
• [17] Giulini, I. (2015). PAC-Bayesian bounds for Principal Component Analysis in Hilbert spaces. Preprint. Available at arXiv:1511.06263.
• [18] Hsu, D. and Sabato, S. (2016). Loss minimization and parameter estimation with heavy tails. J. Mach. Learn. Res. 17 Paper No. 18, 40.
• [19] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73–101.
• [20] Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics, 2nd ed. Wiley, Hoboken, NJ.
• [21] Hubert, M., Rousseeuw, P. J. and Van Aelst, S. (2008). High-breakdown robust multivariate methods. Statist. Sci. 23 92–119.
• [22] Jerrum, M. R., Valiant, L. G. and Vazirani, V. V. (1986). Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci. 43 169–188.
• [23] Joly, E., Lugosi, G. and Oliveira, R. I. (2017). On the estimation of the mean of a random vector. Electron. J. Stat. 11 440–451.
• [24] Klopp, O., Lounici, K. and Tsybakov, A. B. (2017). Robust matrix completion. Probab. Theory Related Fields 169 523–564.
• [25] Koltchinskii, V. and Lounici, K. (2016). New asymptotic results in principal component analysis. Preprint. Available at arXiv:1601.01457.
• [26] Koltchinskii, V. and Lounici, K. (2017). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23 110–133.
• [27] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• [28] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• [29] Lepski, O. (1992). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 682–697.
• [30] Lerasle, M. and Oliveira, R. I. (2011). Robust empirical mean estimators. Preprint. Available at arXiv:1112.3914.
• [31] Lieb, E. H. (1973). Convex trace functions and the Wigner–Yanase–Dyson conjecture. Adv. Math. 11 267–288.
• [32] Lounici, K. (2014). High-dimensional covariance matrix estimation with missing observations. Bernoulli 20 1029–1058.
• [33] Lugosi, G. and Mendelson, S. (2017). Sub-Gaussian estimators of the mean of a random vector. Preprint. Available at arXiv:1702.00482.
• [34] Maronna, R. A. (1976). Robust $M$-estimators of multivariate location and scatter. Ann. Statist. 4 51–67.
• [35] Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli 21 2308–2335.
• [36] Minsker, S. (2018). Supplement to “Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries.” DOI:10.1214/17-AOS1642SUPP.
• [37] Minsker, S. and Wei, X. (2017). Estimation of the covariance structure of heavy-tailed distributions. Preprint. Available at arXiv:1708.00502.
• [38] Nemirovski, A. and Yudin, D. (1983). Problem Complexity and Method Efficiency in Optimization. Wiley, New York.
• [39] Oliveira, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Preprint. Available at arXiv:0911.0600.
• [40] Srivastava, N. and Vershynin, R. (2013). Covariance estimation for distributions with $2+\varepsilon$ moments. Ann. Probab. 41 3081–3111.
• [41] Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
• [42] Tropp, J. A. (2015). An introduction to matrix concentration inequalities. Preprint. Available at arXiv:1501.01571.
• [43] Tyler, D. E. (1987). A distribution-free $M$-estimator of multivariate scatter. Ann. Statist. 15 234–251.
• [44] Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. Preprint. Available at arXiv:1011.3027.
• [45] Zhang, T., Cheng, X. and Singer, A. (2016). Marčenko–Pastur law for Tyler’s $M$-estimator. J. Multivariate Anal. 149 114–123.

#### Supplemental materials

• Supplementary material for the paper: Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. The supplement contains technical details and proofs not included in the main text of the paper.