## Statistical Science

### Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling

#### Abstract

In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical models, there are many, perhaps underappreciated, problems associated with the MCMC analysis of mixtures. The problems are mainly caused by the nonidentifiability of the components under symmetric priors, which leads to so-called label switching in the MCMC output. This means that ergodic averages of component specific quantities will be identical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints, relabelling algorithms and label invariant loss functions. We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior specification.

#### Article information

Source
Statist. Sci., Volume 20, Number 1 (2005), 50-67.

Dates
First available in Project Euclid: 6 June 2005

https://projecteuclid.org/euclid.ss/1118065042

Digital Object Identifier
doi:10.1214/088342305000000016

Mathematical Reviews number (MathSciNet)
MR2182987

Zentralblatt MATH identifier
1100.62032

#### Citation

Jasra, A.; Holmes, C. C.; Stephens, D. A. Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling. Statist. Sci. 20 (2005), no. 1, 50--67. doi:10.1214/088342305000000016. https://projecteuclid.org/euclid.ss/1118065042

#### References

• Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Statistical Modeling 1 287--304.
• Bartlett, M. S. (1957). A comment on D. V. Lindley's statistical paradox. Biometrika 44 533--534.
• Baum, L. E. and Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist. 37 1554--1563.
• Baum, L. E., Petrie, T., Soules, G. and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41 164--171.
• Beal, M. J., Ghahramani, Z. and Rasmussen, C. E. (2002). The infinite hidden Markov model. In Neural Information Processing Systems 14 (T. G. Diettrich, S. Becker and Z. Ghahramani, eds.) 577--584. MIT Press, Cambridge, MA.
• Bernardo, J. M. and Giròn, F. J. (1988). A Bayesian analysis of simple mixture problems. In Bayesian Statistics 3 (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 67--78. Oxford Univ. Press.
• Boys, R. J. and Henderson, D. A. (2003). Data augmentation and marginal updating schemes for inference in hidden Markov models. Technical report, Univ. Newcastle.
• Boys, R. J. and Henderson, D. A. (2004). A Bayesian approach to DNA sequence segmentation (with discussion). Biometrics 60 573--588.
• Cappé, O., Robert, C. P. and Rydén, T. (2001). Reversible jump MCMC converging to birth-and-death MCMC and more general continuous time samplers. Technical report, Univ. Paris Dauphine.
• Cappé, O., Robert, C. P. and Rydén, T. (2003). Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 679--700.
• Casella, G., Mengersen, K. L., Robert, C. P. and Titterington, D. M. (2002). Perfect samplers for mixtures of distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 777--790.
• Celeux, G. (1997). Discussion of On Bayesian analysis of mixtures with an unknown number of components,'' by S. Richardson and P. J. Green. J. Roy. Statist. Soc. Ser. B 59 775--776.
• Celeux, G. (1998). Bayesian inference for mixtures: The label-switching problem. In COMPSTAT 98---Proc. in Computational Statistics (R. Payne and P. J. Green, eds.) 227--232. Physica, Heidelberg.
• Celeux, G., Hurn, M. and Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. J. Amer. Statist. Assoc. 95 957--970.
• Ciuperca, G., Ridolfi, A. and Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Scand. J. Statist. 30 45--59.
• Dellaportas, P. and Papageorgiou, I. (2004). Multivariate mixtures of normals with an unknown number of components. Technical report, Athens Univ.
• Dellaportas, P., Stephens, D. A., Smith, A. F. M. and Guttman, I. (1996). A comparative study of perinatal mortality using a two-component mixture model. In Bayesian Biostatistics (D. A. Berry and D. K. Stangl, eds.) 601--616. Dekker, New York.
• Dempster, A., Laird, N. and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1--38.
• Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. Roy. Statist. Soc. Ser. B 56 363--375.
• Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577--588.
• Fearnhead, P. (2004). Exact and efficient Bayesian inference for multiple changepoint problems. Technical report, Univ. Lancaster.
• Frühwirth-Schnatter, S. (2001). Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J. Amer. Statist. Assoc. 96 194--209.
• Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711--732.
• Green, P. J. (2003). Trans-dimensional Markov chain Monte Carlo. In Highly Structured Stochastic Systems (P. J. Green, N. L. Hjort and S. Richardson, eds.) 179--196. Oxford Univ. Press.
• Green, P. J. and Richardson, S. (2002). Hidden Markov models and disease mapping. J. Amer. Statist. Assoc. 97 1055--1070.
• Gruet, M.-A., Philippe, A. and Robert, C. P. (1999). MCMC control spreadsheets for exponential mixture estimation. J. Comput. Graph. Statist. 8 298--317.
• Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97--109.
• Hurn, M., Justel, A. and Robert, C. P. (2003). Estimating mixtures of regressions. J. Comput. Graph. Statist. 12 55--79.
• Jasra, A., Stephens, D. A. and Holmes, C. C. (2005). Population-based reversible jump Markov chain Monte Carlo. Technical report, Imperial College London.
• Jennison, C. (1997). Discussion of On Bayesian analysis of mixtures with an unknown number of components,'' by S. Richardson and P. J. Green. J. Roy. Statist. Soc. Ser. B 59 778--779.
• Liang, F. and Wong, W. H. (2001). Real parameter evolutionary Monte Carlo with applications to Bayesian mixture models. J. Amer. Statist. Assoc. 96 653--666.
• Lindley, D. V. (1957). A statistical paradox. Biometrika 44 187--192.
• Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
• Marin, J.-M., Mengersen, K. L. and Robert, C. P. (2005). Bayesian modelling and inference on mixtures of distributions. In Bayesian Modelling and Inference on Mixtures of Distributions. Handbook of Statistics 25 (D. Dey and C. R. Rao, eds.). North-Holland, Amsterdam.
• McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. Wiley, Chichester.
• Mengersen, K. L. and Robert, C. P. (1996). Testing for mixtures: A Bayesian entropic approach (with discussion). In Bayesian Statistics 5 (J. O. Berger, J. M. Bernardo, A. P. Dawid, D. V. Lindley and A. F. M. Smith, eds.) 255--276. Oxford Univ. Press.
• Neal, R. (1996). Sampling from multimodal distributions using tempered transitions. Statist. Comput. 4 353--366.
• Newcomb, S. (1886). A generalized theory of the combination of observations so as to obtain the best result. Amer. J. Math. 8 343--366.
• Pearson, K. (1894). Contribution to the mathematical theory of evolution. Philos. Trans. Roy. Soc. London Ser. A 185 71--110.
• Postman, M., Huchra, J. P. and Geller, M. J. (1986). Probes of large-scale structure in the Corona Borealis region. Astronomical J. 92 1238--1246.
• Rao, C. R. (1948). The utilization of multiple measurements in problems of biological classification (with discussion). J. Roy. Statist. Soc. Ser. B 10 159--203.
• Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. Roy. Statist. Soc. Ser. B 59 731--792.
• Robert, C. P. (1997). Discussion of On Bayesian analysis of mixtures with an unknown number of components,'' by S. Richardson and P. J. Green. J. Roy. Statist. Soc. Ser. B 59 758--764.
• Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
• Robert, C. P., Rydén, T. and Titterington, D. M. (2000). Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 57--75.
• Roeder, K. (1990). Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Amer. Statist. Assoc. 85 617--624.
• Stephens, M. (1997a). Bayesian methods for mixtures of normal distributions. D.Phil. dissertation, Dept. Statistics, Univ. Oxford.
• Stephens, M. (1997b). Discussion of On Bayesian analysis of mixtures with an unknown number of components,'' by S. Richardson and P. J. Green. J. Roy. Statist. Soc. Ser. B 59 768--769.
• Stephens, M. (2000a). Bayesian analysis of mixture models with an unknown number of components---An alternative to reversible jump methods. Ann. Statist. 28 40--74.
• Stephens, M. (2000b). Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 795--809.
• Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701--1762.
• Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985). Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester.
• West, M. (1997). Discussion of On Bayesian analysis of mixtures with an unknown number of components,'' by S. Richardson and P. J. Green. J. Roy. Statist. Soc. Ser. B 59 783--784.
• Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E. and Ruzzo, W. L. (2001). Model-based clustering and data transformations for gene expression data. Bioinformatics 17 977--987.