## The Annals of Statistics

### Breakdown points for maximum likelihood estimators of location–scale mixtures

Christian Hennig

#### Abstract

ML-estimation based on mixtures of Normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by McLachlan and Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a further mixture component accounting for “noise” by Fraley and Raftery [The Computer J. 41 (1998) 578–588] were suggested as more robust alternatives. In this paper, the definition of an adequate robustness measure for cluster analysis is discussed and bounds for the breakdown points of the mentioned methods are given. It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on Normal mixtures. If the number of clusters s is treated as fixed, r additional points suffice for all three methods to let the parameters of r clusters explode. Only in the case of r=s is this not possible for t-mixtures. The ability to estimate the number of mixture components, for example, by use of the Bayesian information criterion of Schwarz [Ann. Statist. 6 (1978) 461–464], and to isolate gross outliers as clusters of one point, is crucial for an improved breakdown behavior of all three techniques. Furthermore, a mixture of Normals with an improper uniform distribution is proposed to achieve more robustness in the case of a fixed number of components.

#### Article information

Source
Ann. Statist., Volume 32, Number 4 (2004), 1313-1340.

Dates
First available in Project Euclid: 4 August 2004

https://projecteuclid.org/euclid.aos/1091626171

Digital Object Identifier
doi:10.1214/009053604000000571

Mathematical Reviews number (MathSciNet)
MR2089126

Zentralblatt MATH identifier
1047.62063

#### Citation

Hennig, Christian. Breakdown points for maximum likelihood estimators of location–scale mixtures. Ann. Statist. 32 (2004), no. 4, 1313--1340. doi:10.1214/009053604000000571. https://projecteuclid.org/euclid.aos/1091626171

#### References

• Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716–723.
• Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics 49 803–821.
• Bozdogan, H. (1994). Mixture model cluster analysis using model selection criteria and a new informational measure of complexity. In Multivariate Statistical Modeling. Proc. First US/Japan Conference on the Frontiers of Statistical Modeling. An Informational Approach (H. Bozdogan, ed.) 2 69–113. Kluwer, Dordrecht.
• Bryant, P. and Williamson, J. A. (1986). Maximum likelihood and classification: A comparison of three approaches. In Classification as a Tool of Research (W. Gaul and M. Schader, eds.) 35–45. North-Holland, Amsterdam.
• Byers, S. and Raftery, A. E. (1998). Nearest neighbor clutter removal for estimating features in spatial point processes. J. Amer. Statist. Assoc. 93 577–584.
• Campbell, N. A. (1984). Mixture models and atypical values. Math. Geol. 16 465–477.
• Celeux, G. and Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. J. Classification 13 195–212.
• Davies, P. L. and Gather, U. (1993). The identification of multiple outliers (with discussion). J. Amer. Statist. Assoc. 88 782–801.
• Davies, P. L. and Gather, U. (2002). Breakdown and groups. Technical Report 57-2002, SFB 475, Univ. Dortmund. Available at wwwstat.mathematik.uni-essen.de/~davies/ brkdown220902.ps.gz.
• Day, N. E. (1969). Estimating the components of a mixture of normal distributions. Biometrika 56 463–474.
• Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38.
• DeSarbo, W. S. and Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. J. Classification 5 249–282.
• Donoho, D. L. and Huber, P. J. (1983). The notion of breakdown point. In A Festschrift for Erich L. Lehmann (P. J. Bickel, K. Doksum and J. L. Hodges, Jr., eds.) 157–184. Wadsworth, Belmont, CA.
• Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer J. 41 578–588.
• Gallegos, M. T. (2003). Clustering in the presence of outliers. In Exploratory Data Analysis in Empirical Research (M. Schwaiger and O. Opitz, eds.) 58–66. Springer, Berlin.
• Garcia-Escudero, L. A. and Gordaliza, A. (1999). Robustness properties of $k$ means and trimmed $k$ means. J. Amer. Statist. Assoc. 94 956–969.
• Hampel, F. R. (1971). A general qualitative definition of robustness. Ann. Math. Statist. 42 1887–1896.
• Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383–393.
• Hastie, T. and Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. J. Roy. Statist. Soc. Ser. B 58 155–176.
• Hathaway, R. J. (1985). A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann. Statist. 13 795–800.
• Hathaway, R. J. (1986). A constrained EM algorithm for univariate normal mixtures. J. Stat. Comput. Simul. 23 211–230.
• Hennig, C. (2003). Robustness of ML estimators of location–scale mixtures. Available at www.math.uni-hamburg.de/home/hennig/papers/hennigcottbus.pdf.
• Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73–101.
• Huber, P. J. (1981). Robust Statistics. Wiley, New York.
• Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhyā Ser. A 62 49–66.
• Kharin, Y. (1996). Robustness in Statistical Pattern Recognition. Kluwer, Dordrecht.
• Lindsay, B. G. (1995). Mixture Models: Theory, Geometry and Applications. IMS, Hayward, CA.
• Markatou, M. (2000). Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56 483–486.
• McLachlan, G. J. (1982). The classification and mixture maximum likelihood approaches to cluster analysis. In Handbook of Statistics (P. R. Krishnaiah and L. Kanal, eds.) 2 199–208. North-Holland, Amsterdam.
• McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Statist. 36 318–324.
• McLachlan, G. J. and Basford, K. E. (1988). Mixture Models: Inference and Applications to Clustering. Dekker, New York.
• McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
• Peel, D. and McLachlan, G. J. (2000). Robust mixture modeling using the $t$ distribution. Stat. Comput. 10 339–348.
• Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195–239.
• Rocke, D. M. and Woodruff, D. L. (2000). A synthesis of outlier detection and cluster identification. Unpublished manuscript.
• Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894–902.
• Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
• Tyler, D. E. (1994). Finite sample breakdown points of projection based multivariate location and scatter statistics. Ann. Statist. 22 1024–1044.
• Wang, H. H. and Zhang, H. (2002). Model-based clustering for cross-sectional time series data. J. Agric. Biol. Environ. Statist. 7 107–127.
• Wolfe, J. H. (1967). NORMIX: Computational methods for estimating the parameters of multivariate normal mixtures of distributions. Research Memo SRM 68-2, U.S. Naval Personnel Research Activity, San Diego.
• Zhang, J. and Li, G. (1998). Breakdown properties of location M-estimators. Ann. Statist. 26 1170–1189.
• Zuo, Y. (2001). Some quantitative relationships between two types of finite sample breakdown point. Statist. Probab. Lett. 51 369–375.