• Bernoulli
  • Volume 21, Number 3 (2015), 1304-1340.

Adaptive MCMC with online relabeling

Rémi Bardenet, Olivier Cappé, Gersende Fort, and Balázs Kégl

Full-text: Open access


When targeting a distribution that is artificially invariant under some permutations, Markov chain Monte Carlo (MCMC) algorithms face the label-switching problem, rendering marginal inference particularly cumbersome. Such a situation arises, for example, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM), which self-calibrates its proposal distribution using an online estimate of the covariance matrix of the target, are no exception. To address the label-switching issue, relabeling algorithms associate a permutation to each MCMC sample, trying to obtain reasonable marginals. In the case of adaptive Metropolis ( Bernoulli 7 (2001) 223–242), an online relabeling strategy is required. This paper is devoted to the AMOR algorithm, a provably consistent variant of AM that can cope with the label-switching problem. The idea is to nest relabeling steps within the MCMC algorithm based on the estimation of a single covariance matrix that is used both for adapting the covariance of the proposal distribution in the Metropolis algorithm step and for online relabeling. We compare the behavior of AMOR to similar relabeling methods. In the case of compactly supported target distributions, we prove a strong law of large numbers for AMOR and its ergodicity. These are the first results on the consistency of an online relabeling algorithm to our knowledge. The proof underlines latent relations between relabeling and vector quantization.

Article information

Bernoulli, Volume 21, Number 3 (2015), 1304-1340.

Received: October 2012
Revised: October 2013
First available in Project Euclid: 27 May 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

adaptive Markov chain Monte Carlo label-switching stochastic approximation vector quantization


Bardenet, Rémi; Cappé, Olivier; Fort, Gersende; Kégl, Balázs. Adaptive MCMC with online relabeling. Bernoulli 21 (2015), no. 3, 1304--1340. doi:10.3150/13-BEJ578.

Export citation


  • [1] Andrieu, C., Moulines, É. and Priouret, P. (2005). Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. 44 283–312.
  • [2] Andrieu, C. and Robert, C.P. (2011). Controlled Markov chain Monte Carlo methods for optimal sampling. Technical Report 125, Cahiers du Ceremade, Université Paris Dauphine.
  • [3] Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive MCMC. Stat. Comput. 18 343–373.
  • [4] Atchadé, Y., Fort, G., Moulines, E. and Priouret, P. (2011). Adaptive Markov chain Monte Carlo: Theory and methods. In Bayesian Time Series Models 32–51. Cambridge: Cambridge Univ. Press.
  • [5] Bardenet, R., Cappé, O., Fort, G. and Kégl, B. (2012). Adaptive Metropolis with online relabeling. In International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR Workshop and Conference Proceedings 22 91–99. Microtome Publishing.
  • [6] Bardenet, R., Cappé, O., Fort, G. and Kégl, B. (2014). Supplement to “Adaptive MCMC with online relabeling.” DOI:10.3150/13-BEJ578SUPP.
  • [7] Bardenet, R. and Kégl, B. (2012). An adaptive Monte-Carlo Markov chain algorithm for inference from mixture signals. J. Phys. Conf. Ser. 368 012044.
  • [8] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge Univ. Press.
  • [9] Celeux, G. (1998). Bayesian inference for mixtures: The label-switching problem. In Computational Statistics Symposium (COMPSTAT) 227–232. Berlin: Springer.
  • [10] Celeux, G., Hurn, M. and Robert, C.P. (2000). Computational and inferential difficulties with mixture posterior distributions. J. Amer. Statist. Assoc. 95 957–970.
  • [11] Chen, H.-F. (2002). Stochastic Approximation and Its Applications. Nonconvex Optimization and Its Applications 64. Dordrecht: Kluwer Academic.
  • [12] Cron, A.J. and West, M. (2011). Efficient classification-based relabeling in mixture models. Amer. Statist. 65 16–20.
  • [13] Fort, G., Moulines, E. and Priouret, P. (2011). Convergence of adaptive and interacting Markov chain Monte Carlo algorithms. Ann. Statist. 39 3262–3289.
  • [14] Graf, S. and Luschgy, H. (2000). Foundations of Quantization for Probability Distributions. Lecture Notes in Math. 1730. Berlin: Springer.
  • [15] Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • [16] Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7 223–242.
  • [17] Jasra, A. (2005). Bayesian inference for mixture models via Monte Carlo. Ph.D. thesis, Imperial College, London, UK.
  • [18] Jasra, A., Holmes, C.C. and Stephens, D.A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statist. Sci. 20 50–67.
  • [19] Marin, J.-M., Mengersen, K. and Robert, C.P. (2005). Bayesian modelling and inference on mixtures of distributions. In Bayesian Thinking: Modeling and Computation. Handbook of Statist. 25 459–507. Amsterdam: Elsevier.
  • [20] Meyn, S.P. and Tweedie, R.L. (1993). Markov Chains and Stochastic Stability. Communications and Control Engineering Series. London: Springer.
  • [21] Pagès, G. (1998). A space quantization method for numerical integration. J. Comput. Appl. Math. 89 1–38.
  • [22] Papastamoulis, P. and Iliopoulos, G. (2010). An artificial allocations based solution to the label switching problem in Bayesian analysis of mixtures of distributions. J. Comput. Graph. Statist. 19 313–331.
  • [23] Richardson, S. and Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components. J. Roy. Statist. Soc. Ser. B 59 731–792.
  • [24] Robert, C.P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer Texts in Statistics. New York: Springer.
  • [25] Roberts, G.O., Gelman, A. and Gilks, W.R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 110–120.
  • [26] Roberts, G.O. and Rosenthal, J.S. (2001). Optimal scaling for various Metropolis–Hastings algorithms. Statist. Sci. 16 351–367.
  • [27] Roberts, G.O. and Rosenthal, J.S. (2007). Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Probab. 44 458–475.
  • [28] Roberts, G.O. and Rosenthal, J.S. (2009). Examples of adaptive MCMC. J. Comput. Graph. Statist. 18 349–367.
  • [29] Roodaki, A. (2012). Signal decompositions using trans-dimensional Bayesian methods. Ph.D. thesis, Supélec, Gif-sur-Yvette, France.
  • [30] Roodaki, A., Bect, J. and Fleury, G. (2012). Summarizing posterior distributions in signal decomposition problems when the number of components is unknown. In IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP) 3873–3876. Berlin: Springer.
  • [31] Sperrin, M., Jaki, T. and Wit, E. (2010). Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat. Comput. 20 357–366.
  • [32] Stephens, M. (2000). Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 795–809.

Supplemental materials

  • Long version of the paper. This long version of the paper features an additional evaluated method for Section 2.2 (AM with posterior reordering), examples of the behavior of AMOR on a nonlinear symmetrized unimodal distribution and on a genuinely bimodal distribution, and complete proofs.