## The Annals of Statistics

### Halfspace depths for scatter, concentration and shape matrices

#### Abstract

We propose halfspace depth concepts for scatter, concentration and shape matrices. For scatter matrices, our concept is similar to those from Chen, Gao and Ren [Robust covariance and scatter matrix estimation under Huber’s contamination model (2018)] and Zhang [J. Multivariate Anal. 82 (2002) 134–165]. Rather than focusing, as in these earlier works, on deepest scatter matrices, we thoroughly investigate the properties of the proposed depth and of the corresponding depth regions. We do so under minimal assumptions and, in particular, we do not restrict to elliptical distributions nor to absolutely continuous distributions. Interestingly, fully understanding scatter halfspace depth requires considering different geometries/topologies on the space of scatter matrices. We also discuss, in the spirit of Zuo and Serfling [Ann. Statist. 28 (2000) 461–482], the structural properties a scatter depth should satisfy, and investigate whether or not these are met by scatter halfspace depth. Companion concepts of depth for concentration matrices and shape matrices are also proposed and studied. We show the practical relevance of the depth concepts considered in a real-data example from finance.

#### Article information

Source
Ann. Statist., Volume 46, Number 6B (2018), 3276-3307.

Dates
Revised: October 2017
First available in Project Euclid: 11 September 2018

https://projecteuclid.org/euclid.aos/1536631274

Digital Object Identifier
doi:10.1214/17-AOS1658

Mathematical Reviews number (MathSciNet)
MR3852652

Zentralblatt MATH identifier
1408.62100

Subjects
Primary: 62H20: Measures of association (correlation, canonical correlation, etc.)
Secondary: 62G35: Robustness

#### Citation

Paindaveine, Davy; Van Bever, Germain. Halfspace depths for scatter, concentration and shape matrices. Ann. Statist. 46 (2018), no. 6B, 3276--3307. doi:10.1214/17-AOS1658. https://projecteuclid.org/euclid.aos/1536631274

#### References

• Arcones, M. A. and Giné, E. (1993). Limit theorems for $U$-processes. Ann. Probab. 21 1494–1542.
• Berger, M. (2003). A Panoramic View of Riemannian Geometry. Springer, Berlin.
• Bhatia, R. (2007). Positive Definite Matrices. Princeton Univ. Press, Princeton, NJ.
• Bhatia, R. and Holbrook, J. (2006). Riemannian geometry and matrix geometric means. Linear Algebra Appl. 413 594–618.
• Cardot, H., Cénac, P. and Godichon-Baggioni, A. (2017). Online estimation of the geometric median in Hilbert spaces: Nonasymptotic confidence balls. Ann. Statist. 45 591–614.
• Cartan, E. (1929). Groupes simples clos et ouverts et géometrie riemannienne. J. Math. Pures Appl. 8 1–33.
• Chakraborty, A. and Chaudhuri, P. (2014). The spatial distribution in infinite dimensional spaces and related quantiles and depths. Ann. Statist. 42 1203–1231.
• Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. J. Amer. Statist. Assoc. 91 862–872.
• Chen, M., Gao, C. and Ren, Z. (2018). Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann. Statist. To appear.
• Claeskens, G., Hubert, M., Slaets, L. and Vakili, K. (2014). Multivariate functional halfspace depth. J. Amer. Statist. Assoc. 109 411–423.
• Cuevas, A., Febrero, M. and Fraiman, R. (2007). Robust estimation and classification for functional data via projection-based depth notions. Comput. Statist. 22 481–496.
• Dang, X. and Serfling, R. J. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. J. Statist. Plann. Inference 140 198–213.
• Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. 20 1803–1827.
• Dümbgen, L. and Tyler, D. E. (2016). Geodesic convexity and regularized scatter estimators. Available at arXiv:1607.05455v2.
• Fan, Y., Jin, J. and Yao, Z. (2013). Optimal classification in sparse Gaussian graphic model. Ann. Statist. 41 2537–2571.
• Fan, Y. and Lv, J. (2016). Innovated scalable efficient estimation in ultra-large Gaussian graphical models. Ann. Statist. 44 2098–2126.
• Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
• Hallin, M., Paindaveine, D. and Šiman, M. (2010). Multivariate quantiles and multiple-output regression quantiles: From $L_{1}$ optimization to halfspace depth. Ann. Statist. 38 635–669.
• He, Y. and Einmahl, J. H. J. (2017). Estimation of extreme depth-based quantile regions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 449–461.
• Hubert, M., Rousseeuw, P. J. and Segaert, P. (2015). Multivariate functional outlier detection. Stat. Methods Appl. 24 177–202.
• Ilmonen, P. and Paindaveine, D. (2011). Semiparametrically efficient inference based on signed ranks in symmetric independent component models. Ann. Statist. 39 2448–2476.
• Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
• Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Statist. 27 783–858.
• López-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data. J. Amer. Statist. Assoc. 104 718–734.
• Mizera, I. (2002). On depth and deep points: A calculus. Ann. Statist. 30 1681–1736.
• Mizera, I. and Müller, C. H. (2004). Location-scale depth. J. Amer. Statist. Assoc. 99 949–989.
• Nieto-Reyes, A. and Battey, H. (2016). A topologically valid definition of depth for functional data. Statist. Sci. 31 61–79.
• Paindaveine, D. and Van Bever, G. (2014). Inference on the shape of elliptical distributions based on the MCD. J. Multivariate Anal. 129 125–144.
• Paindaveine, D. and Van Bever, G. (2015). Nonparametrically consistent depth-based classifiers. Bernoulli 21 62–82.
• Paindaveine, D. and Van Bever, G. (2018). Supplement to “Halfspace depths for scatter, concentration and shape matrices.” DOI:10.1214/17-AOS1658SUPP.
• Rousseeuw, P. J. and Hubert, M. (1999). Regression depth. J. Amer. Statist. Assoc. 94 388–433.
• Rousseeuw, P. J. and Ruts, I. (1999). The depth function of a population distribution. Metrika 49 213–244.
• Rousseeuw, P. J. and Struyf, A. (2004). Characterizing angular symmetry and regression symmetry. J. Statist. Plann. Inference 122 161–173.
• Serfling, R. J. (2004). Some perspectives on location and scale depth functions. J. Amer. Statist. Assoc. 99 970–973.
• Serfling, R. (2010). Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardisation. J. Nonparametr. Stat. 22 915–936.
• Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), Vol. 2 523–531. Canad. Math. Congress, Montreal.
• Vardi, Y. and Zhang, C.-H. (2000). The multivariate $L_{1}$-median and associated data depth. Proc. Natl. Acad. Sci. USA 97 1423–1426.
• Zhang, J. (2002). Some extensions of Tukey’s depth function. J. Multivariate Anal. 82 134–165.
• Zuo, Y. (2003). Projection-based depth functions and associated medians. Ann. Statist. 31 1460–1490.
• Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist. 28 461–482.

#### Supplemental materials

• Supplement to “Halfspace depths for scatter, concentration and shape matrices”. In this supplement, we conduct a Monte Carlo exercise validating the explicit scatter halfspace depth expressions obtained in the Gaussian and independent Cauchy examples. We also provide illustrations of Theorem 3.3 and Theorems 7.8–7.10. Finally, we prove all theorems stated in this paper.