The Annals of Statistics

Propagation of outliers in multivariate data

Fatemah Alqallaf, Stefan Van Aelst, Victor J. Yohai, and Ruben H. Zamar

Full-text: Open access

Abstract

We investigate the performance of robust estimates of multivariate location under nonstandard data contamination models such as componentwise outliers (i.e., contamination in each variable is independent from the other variables). This model brings up a possible new source of statistical error that we call “propagation of outliers.” This source of error is unusual in the sense that it is generated by the data processing itself and takes place after the data has been collected. We define and derive the influence function of robust multivariate location estimates under flexible contamination models and use it to investigate the effect of propagation of outliers. Furthermore, we show that standard high-breakdown affine equivariant estimators propagate outliers and therefore show poor breakdown behavior under componentwise contamination when the dimension d is high.

Article information

Source
Ann. Statist., Volume 37, Number 1 (2009), 311-331.

Dates
First available in Project Euclid: 16 January 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1232115936

Digital Object Identifier
doi:10.1214/07-AOS588

Mathematical Reviews number (MathSciNet)
MR2488353

Zentralblatt MATH identifier
1155.62043

Subjects
Primary: 62F35: Robustness and adaptive procedures
Secondary: 62H12: Estimation

Keywords
Breakdown point contamination model independent contamination influence function robustness

Citation

Alqallaf, Fatemah; Van Aelst, Stefan; Yohai, Victor J.; Zamar, Ruben H. Propagation of outliers in multivariate data. Ann. Statist. 37 (2009), no. 1, 311--331. doi:10.1214/07-AOS588. https://projecteuclid.org/euclid.aos/1232115936


Export citation

References

  • Alqallaf, F., Van Aelst, S., Yohai, V. J. and Zamar, R. H. (2006). A model for contamination in multivariate data. Technical report, Dept. Statistics, Univ. British Columbia, Vancouver. Available online at http://users.ugent.be/~svaelst/publications.html.
  • Barnet, V. and Lewis, T. (1994). Outliers in Statistical Data. Wiley, New York.
  • Croux, C., Filzmoser, P., Pison, G. and Rousseeuw, P. J. (2003). Fitting multiplicative models by robust alternating regressions. Statist. Comput. 13 23–36.
  • Davies, P. L. (1987). Asymptotic behavior of S-estimates of multivariate location parameters and dispersion matrices. Ann. Statist. 15 1269–1292.
  • Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Qualifying paper, Harvard Univ.
  • Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
  • He, X., Simpson, D. G. and Portnoy, S. (1990). Breakdown robustness of tests. J. Amer. Statist. Assoc. 85 446–452.
  • He, X. and Simpson, D. G. (1993). Lower bounds for contamination bias: Globally minimax versus locally linear estimation. Ann. Statist. 21 314–337.
  • Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73–101.
  • Kent, J. T and Tyler, D. E. (1996). Constrained M-estimation for multivariate location and scatter. Ann. Statist. 24 1346–1370.
  • Liu, L., Hawkins, D. M., Ghosh, S. and Young, S. S. (2003). Robust singular value decomposition analysis of microarray data. Proc. Natl. Acad. Sci. USA 100 13167–13172.
  • Lopuhaä, H. P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariance. Ann. Statist. 17 1662–1683.
  • Lopuhaä, H. P. (1991). Multivariate τ-estimators for location and scatter. Canad. J. Statist. 19 307–321.
  • Lopuhaä, H. P. and Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Statist. 19 229–248.
  • Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. Ann. Statist. 4 51–67.
  • Maronna, R. A. and Yohai, V. J. (2008). Robust lower-rank approximation of data matrices with element-wise contamination. Technometrics 50 295–304.
  • Martin, R. D., Yohai, V. J. and Zamar, R. H. (1989). Min-max bias robust regression. Ann. Statist. 17 1608–1630.
  • Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871–880.
  • Stahel, W. A. (1981). Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. PhD thesis, ETH Zürich.
  • Tatsuoka, K. S. and Tyler, D. E. (2000). The uniqueness of S and M-functionals under nonelliptical distributions. Ann. Statist. 28 1219–1243.
  • Tukey, J. W. (1962). The future of data analysis. Ann. Math. Statist. 33 1–67.
  • Tyler, D. E. (2002). High-breakdown point multivariate M-estimation. Estadística 54 213–247.