## The Annals of Statistics

### Robustness properties of S-estimators of multivariate location and shape in high dimension

David M. Rocke

#### Abstract

For the problem of robust estimation of multivariate location and shape, defining S-estimators using scale transformations of a fixed $\rho$ function regardless of the dimension, as is usually done, leads to a perverse outcome: estimators in high dimension can have a breakdown point approaching 50%, but still fail to reject as outliers points that are large distances from the main mass of points. This leads to a form of nonrobustness that has important practical consequences. In this paper, estimators are defined that improve on known S-estimators in having all of the following properties: (1) maximal breakdown for the given sample size and dimension; (2) ability completely to reject as outliers points that are far from the main mass of points; (3) convergence to good solutions with a modest amount of computation from a nonrobust starting point for large (though not near 50%) contamination. However, to attain maximal breakdown, these estimates, like other known maximal breakdown estimators, require large amounts of computational effort. This greater ability of the new estimators to reject outliers comes at a modest cost in efficiency and gross error sensitivity and at a greater, but finite, cost in local shift sensitivity.

#### Article information

Source
Ann. Statist., Volume 24, Number 3 (1996), 1327-1345.

Dates
First available in Project Euclid: 20 September 2002

https://projecteuclid.org/euclid.aos/1032526972

Digital Object Identifier
doi:10.1214/aos/1032526972

Mathematical Reviews number (MathSciNet)
MR1401853

Zentralblatt MATH identifier
0862.62049

Subjects
Primary: 62H12: Estimation 62F35: Robustness and adaptive procedures

#### Citation

Rocke, David M. Robustness properties of S -estimators of multivariate location and shape in high dimension. Ann. Statist. 24 (1996), no. 3, 1327--1345. doi:10.1214/aos/1032526972. https://projecteuclid.org/euclid.aos/1032526972

#### References

• Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions. Dover, New York. Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H. and Tukey, J. W.
• (1972). Robust Estimates of Location: Survey and Advances. Princeton Univ. Press.
• Campbell, N. A. (1980). Robust procedures in multivariate analysis I: robust covariance estimation. J. Roy. Statist. Soc. Ser. C 29 231-237.
• Campbell, N. A. (1982). Robust procedures in multivariate analysis I: robust canonical variate analysis. J. Roy. Statist. Soc. Ser. C 31 1-8.
• Davies, P. L. (1987). Asy mptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. Ann. Statist. 15 1269-1292.
• Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, Dept. Statistics, Harvard Univ.
• Donoho, D. L. and Huber, P. J. (1983). The notion of breakdown point. In A Festschrift for Erich L. Lehmann (P. J. Bickell, K. A. Doksum and J. L. Hodges, eds.) 157-184. Wadsworth, Belmont, CA.
• Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
• Huber, P. J. (1981). Robust Statistics. Wiley, New York.
• Huber, P. J. (1985). Projection pursuit. Ann. Statist. 13 435-475.
• Kent, J. T. and Ty ler, D. E. (1991). Redescending M-estimates of multivariate location and scatter. Ann. Statist. 19 2102-2119.
• Lopuha¨a, H. P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariance. Ann. Statist. 17 1662-1683.
• Lopuha¨a, H. P. and Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Statist. 19 229-248.
• Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. Ann. Statist. 4 51-67.
• Rocke, D. M. (1993). On Mand S-estimators of multivariate location and shape. Unpublished manuscript.
• Rocke, D. M. and Woodruff, D. L. (1993). Computation of robust estimates of multivariate location and shape. Statist. Neerlandica 47 27-42.
• Rocke, D. M. and Woodruff, D. L. (1996). Identification of outliers in multvariate data. J. Amer. Statist. Assoc. To appear.
• Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In Mathematical Statistics and Applications B (W. Grossmann, G. Pflug, I. Vincze and W. Werz, eds.) 283-297. Reidel, Dordrecht.
• Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley, New York.
• Rousseeuw, P. J. and Yohai, V. (1984). Robust regression by means of S-estimators. Robust and Nonlinear Time Series Analy sis. Lecture Notes in Statist. 26 256-272. Springer, Berlin. Rousseeuw, P. J. and van Zomeren, B. C. (1990a). Unmasking multivariate outliers and leverage points. J. Amer. Statist. Assoc. 85 633-639. Rousseeuw, P. J. and van Zomeren, B. C. (1990b). Rejoinder. J. Amer. Statist. Assoc. 85 648-651.
• Rousseeuw, P. J. and van Zomeren, B. C. (1991). Robust distances: simulations and cutoff values. In Directions in Robust Statistics and Diagnostics 2 (W. Stahel and S. Weisberg, eds.) 195-203. Springer, New York.
• Stahel, W. A. (1981). Robuste Sch¨atzungen: Infinitesimale Optimalit¨at und Sch¨atzungen von Kovarianzmatrizen. Ph.D. dissertation, ETH, Zurich.
• Ty ler, D. E. (1983). Robustness and efficiency properties of scatter matrices. Biometrika 70 411- 420.
• Ty ler, D. E. (1988). Some results on the existence, uniqueness, and computation of the Mestimates of multivariate location and scatter. SIAM J. Sci. Statist. Comput. 9 354-362.
• Ty ler, D. E. (1991). Some issues in the robust estimation of multivariate location and scatter. In Directions in Robust Statistics and Diagnostics 2 (W. Stahel and S. Weisberg, eds.) 327-336. Springer, New York.
• Woodruff, D. L. and Rocke, D. M. (1993). Heuristic search algorithms for the minimum volume ellipsoid. J. Comput. Graphical Statist. 2 69-95.
• Woodruff, D. L. and Rocke, D. M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators. J. Amer. Statist. Assoc. 89 888-896.