## The Annals of Statistics

### Robust covariance and scatter matrix estimation under Huber’s contamination model

#### Abstract

Covariance matrix estimation is one of the most important problems in statistics. To accommodate the complexity of modern datasets, it is desired to have estimation procedures that not only can incorporate the structural assumptions of covariance matrices, but are also robust to outliers from arbitrary sources. In this paper, we define a new concept called matrix depth and then propose a robust covariance matrix estimator by maximizing the empirical depth function. The proposed estimator is shown to achieve minimax optimal rate under Huber’s $\varepsilon$-contamination model for estimating covariance/scatter matrices with various structures including bandedness and sparsity.

#### Article information

Source
Ann. Statist., Volume 46, Number 5 (2018), 1932-1960.

Dates
Revised: June 2017
First available in Project Euclid: 17 August 2018

https://projecteuclid.org/euclid.aos/1534492824

Digital Object Identifier
doi:10.1214/17-AOS1607

Mathematical Reviews number (MathSciNet)
MR3845006

Zentralblatt MATH identifier
06964321

Subjects
Primary: 62H12: Estimation
Secondary: 62C20: Minimax procedures

#### Citation

Chen, Mengjie; Gao, Chao; Ren, Zhao. Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann. Statist. 46 (2018), no. 5, 1932--1960. doi:10.1214/17-AOS1607. https://projecteuclid.org/euclid.aos/1534492824

#### References

• [1] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• [2] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [3] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
• [4] Buja, A. (1986). On the Huber–Strassen theorem. Probab. Theory Related Fields 73 149–152.
• [5] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• [6] Cai, T. T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781–815.
• [7] Cai, T. T., Ren, Z. and Zhou, H. H. (2013). Optimal rates of convergence for estimating Toeplitz covariance matrices. Probab. Theory Related Fields 156 101–143.
• [8] Cai, T. T., Ren, Z. and Zhou, H. H. (2016). Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Stat. 10 1–59.
• [9] Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• [10] Cai, T. T. and Zhou, H. H. (2012). Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist. 40 2389–2420.
• [11] Chen, M., Gao, C. and Ren, Z. (2018). Supplement to “Robust covariance and scatter matrix estimation under Huber’s contamination model.” DOI:10.1214/17-AOS1607SUPP.
• [12] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices and Banach spaces. Handbook of the Geometry of Banach Spaces 1 131.
• [13] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46.
• [14] Donoho, D. and Huber, P. J. (1983). The notion of breakdown point. In A Festschrift for Erich L. Lehmann. 157–184. Wadsworth, Belmont, CA.
• [15] Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Technical report, Harvard Univ., Boston. Available at http://www-stat.stanford.edu/~donoho/Reports/Oldies/BPMLE.pdf.
• [16] Donoho, D. L. (1994). Statistical estimation and optimal recovery. Ann. Statist. 22 238–270.
• [17] Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. 20 1803–1827.
• [18] Donoho, D. L. and Liu, R. C. (1991). Geometrizing rates of convergence, III. Ann. Statist. 19 668–701.
• [19] Donoho, D. L. and Montanari, A. (2015). Variance breakdown of Huber (M)-estimators: $n/p\rightarrow m\in (1,\infty)$. Preprint. Available at arXiv:1503.02106.
• [20] Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6 899–929.
• [21] Dümbgen, L. (1998). On Tyler’s M-functional of scatter in high dimension. Ann. Inst. Statist. Math. 50 471–491.
• [22] Fan, J., Han, F. and Liu, H. (2014). PAGE: Robust pattern guided estimation of large covariance matrix. Technical report, Princeton Univ., Princeton, NJ.
• [23] Fang, K.-T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London.
• [24] Friston, K. J., Jezzard, P. and Turner, R. (1994). Analysis of functional MRI time-series. Human Brain Mapping 1 153–171.
• [25] Hampel, F. R. (1971). A general qualitative definition of robustness. Ann. Math. Stat. 42 1887–1896.
• [26] Han, F. and Liu, H. (2013). Optimal rates of convergence for latent generalized correlation matrix estimation in transelliptical distribution. Preprint. Available at arXiv:1305.6916.
• [27] Han, F. and Liu, H. (2014). Scale-invariant sparse PCA on high-dimensional meta-elliptical data. J. Amer. Statist. Assoc. 109 275–287.
• [28] Han, F. and Liu, H. (2017). ECA: High dimensional elliptical component analysis in non-Gaussian distributions. J. Amer. Statist. Assoc. To appear.
• [29] Han, F., Lu, J. and Liu, H. (2014). Robust scatter matrix estimation for high dimensional distributions with heavy tails. Technical report, Princeton Univ.
• [30] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73–101.
• [31] Huber, P. J. (1965). A robust version of the probability ratio test. Ann. Math. Stat. 36 1753–1758.
• [32] Huber, P. J. and Strassen, V. (1973). Minimax tests and the Neyman–Pearson lemma for capacities. Ann. Statist. 1 251–263.
• [33] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• [34] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• [35] Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
• [36] Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Statist. 27 783–858.
• [37] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
• [38] Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. Ann. Statist. 4 51–67.
• [39] Mitra, R. and Zhang, C.-H. (2014). Multivariate analysis of nonparametric estimates of large correlation matrices. Preprint. Available at arXiv:1403.6195.
• [40] Mizera, I. (2002). On depth and deep points: A calculus. Ann. Statist. 30 1681–1736.
• [41] Mizera, I. and Müller, C. H. (2004). Location-scale depth. J. Amer. Statist. Assoc. 99 949–966.
• [42] Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1 327–332.
• [43] Rousseeuw, P. J. and Hubert, M. (1999). Regression depth. J. Amer. Statist. Assoc. 94 388–402.
• [44] Serfling, R. (2004). Some perspectives on location and scale depth functions. J. Amer. Statist. Assoc. 99 970–973.
• [45] Tukey, J. W. (1974). T6: Order Statistics, in mimeographed notes for Statistics 411. Dept. Statistics, Princeton Univ.
• [46] Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians 2 523–531.
• [47] Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley Series in Behavioral Science: Quantitative Methods. Reading, MA.
• [48] Tyler, D. E. (1987). A distribution-free M-estimator of multivariate scatter. Ann. Statist. 15 234–251.
• [49] van der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge Univ. Press, Cambridge.
• [50] Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264–280.
• [51] Vardi, Y. and Zhang, C.-H. (2000). The multivariate $\ell _{1}$-median and associated data depth. Proc. Natl. Acad. Sci. USA 97 1423–1426.
• [52] Visser, H. and Molenaar, J. (1995). Trend estimation and regression analysis in climatological time series: An application of structural time series models and the Kalman filter. J. Climate 8 969–979.
• [53] Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905–2947.
• [54] Wegkamp, M. and Zhao, Y. (2016). Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. Bernoulli 22 1184–1226.
• [55] Xue, L. and Zou, H. (2013). Optimal estimation of sparse correlation matrices of semiparametric Gaussian copulas. Stat. Interface 7 201–209.
• [56] Xue, L. and Zou, H. (2014). Rank-based tapering estimation of bandable correlation matrices. Statist. Sinica 24 83–100.
• [57] Zhang, J. (2002). Some extensions of Tukey’s depth function. J. Multivariate Anal. 82 134–165.
• [58] Zuo, Y. and Cui, H. (2005). Depth weighted scatter estimators. Ann. Statist. 33 381–413.
• [59] Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist. 28 461–482.
• [60] Zuo, Y. and Serfling, R. (2000). Nonparametric notions of multivariate “scatter measure” and “more scattered” based on statistical depth functions. J. Multivariate Anal. 75 62–78.

#### Supplemental materials

• Supplement to “Robust covariance and scatter matrix estimation under Huber’s contamination model”. In this supplement, we collect the proofs for the remaining main results, provide details on the extension to the noncentered observations and demonstrate numerical studies in low-to-moderate dimensional settings.