Institute of Mathematical Statistics Collections

On Convergence Properties of the Monte Carlo EM Algorithm

Ronald C. Neath

Full-text: Open access

Abstract

The Expectation-Maximization (EM) algorithm (Dempster, Laird and Rubin, 1977) is a popular method for computing maximum likelihood estimates (MLEs) in problems with missing data. Each iteration of the algorithm formally consists of an E-step: evaluate the expected complete-data log-likelihood given the observed data, with expectation taken at current parameter estimate; and an M-step: maximize the resulting expression to find the updated estimate. Conditions that guarantee convergence of the EM sequence to a unique MLE were found by Boyles (1983) and Wu (1983). In complicated models for high-dimensional data, it is common to encounter an intractable integral in the E-step. The Monte Carlo EM algorithm of Wei and Tanner (1990) works around this difficulty by maximizing instead a Monte Carlo approximation to the appropriate conditional expectation. Convergence properties of Monte Carlo EM have been studied, most notably, by Chan and Ledolter (1995) and Fort and Moulines (2003).

The goal of this review paper is to provide an accessible but rigorous introduction to the convergence properties of EM and Monte Carlo EM. No previous knowledge of the EM algorithm is assumed. We demonstrate the implementation of EM and Monte Carlo EM in two simple but realistic examples. We show that if the EM algorithm converges it converges to a stationary point of the likelihood, and that the rate of convergence is linear at best. For Monte Carlo EM we present a readable proof of the main result of Chan and Ledolter (1995), and state without proof the conclusions of Fort and Moulines (2003). An important practical implication of Fort and Moulines’s (2003) result relates to the determination of Monte Carlo sample sizes in MCEM; we provide a brief review of the literature (Booth and Hobert, 1999; Caffo, Jank and Jones, 2005) on that problem.

Chapter information

Source
Galin Jones and Xiaotong Shen, eds., Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton, (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2013) , 43-62

Dates
First available in Project Euclid: 23 September 2013

Permanent link to this document
https://projecteuclid.org/euclid.imsc/1379942047

Digital Object Identifier
doi:10.1214/12-IMSCOLL1003

Mathematical Reviews number (MathSciNet)
MR3586938

Zentralblatt MATH identifier
1329.62287

Subjects
Primary: 62-02: Research exposition (monographs, survey articles)

Keywords
convergence EM algorithm maximum likelihood mixed model Monte Carlo

Rights
Copyright © 2013, Institute of Mathematical Statistics

Citation

Neath, Ronald C. On Convergence Properties of the Monte Carlo EM Algorithm. Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton, 43--62, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2013. doi:10.1214/12-IMSCOLL1003. https://projecteuclid.org/euclid.imsc/1379942047


Export citation

References

  • Arrowsmith, D. K. and Place, C. M. (1992). Dynamical Systems: Differential Equations, Maps and Chaotic Behavior. Chapman & Hall, London.
  • Billingsley, P. (1995). Probability and Measure, third ed. Wiley, New York.
  • Booth, J. G. and Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B 61 265–285.
  • Booth, J. G., Hobert, J. P. and Jank, W. (2001). A survey of Monte Carlo algorithms for maximizing the likelihood of a two-stage hierarchical model. Statistical Modelling: An International Journal 1 333–349.
  • Boyles, R. A. (1983). On the convergence of the EM algorithm. Journal of the Royal Statistical Society, Series B 45 47–50.
  • Caffo, B. S., Jank, W. and Jones, G. L. (2005). Ascent-based Monte Carlo EM. Journal of the Royal Statistical Society, Series B 67 235–251.
  • Chan, K. S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series involving counts. Journal of the American Statistical Association 90 242–252.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39 1–22.
  • Fletcher, R. (1987). Practical Methods of Optimization, Second ed. Wiley, New York.
  • Fort, G. and Moulines, E. (2003). Convergence of the Monte Carlo expectation maximization for curved exponential families. The Annals of Statistics 31 1220–1259.
  • Geyer, C. J. (1998). Course notes: Inequality-constrained statistical inference, School of Statistics, University of Minnesota.
  • Heath, J. W., Fu, M. C. and Jank, W. (2009). New global optimization algorithms for model-based clustering. Computational Statistics & Data Analysis 53 3999–4017.
  • Jank, W. (2004). Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM. Computational Statistics & Data Analysis 48 685–701.
  • Johnson, A. A., Jones, G. L. and Neath, R. C. (2011). Component-wise Markov chain Monte Carlo. ArXiv e-prints.
  • Jones, G. L. and Hobert, J. P. (2001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statistical Science 16 312–334.
  • Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society, Series B 57 425–437.
  • L’Ecuyer, P. and Lemieux, C. (2002). Recent advances in randomized quasi-Monte Carlo methods. In Modeling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications ( M. Dror, P. L’Ecuyer and F. Szidarovski, eds.) 419–474. Kluwer Academic Publishers, Norwell, Massachusetts.
  • Levine, R. A. and Casella, G. (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics 10 422–439.
  • Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B 44 226–233.
  • McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association 92 162–170.
  • McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York.
  • Murray, G. D. (1977). Discussion of the paper by Professor Dempster et al. Journal of the Royal Statistical Society, Series B 39 27–28.
  • Neath, R. C. (2006). Monte Carlo methods for likelihood-based inference in hierarchical models PhD thesis, University of Minnesota, School of Statistics.
  • Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, second ed. Springer-Verlag, New York.
  • Sherman, R. P., Ho, Y.-Y. K. and Dalal, S. R. (1997). Conditions for convergence of Monte Carlo EM sequences with an application to product diffusion modeling. Econometrics Journal 2 248–267.
  • Snedecor, G. W. and Cochran, W. G. (1989). Statistical Methods, eighth ed. Iowa State University Press, Ames.
  • Tu, Y., Ball, M. and Jank, W. (2008). Estimating flight departure delay distributions–a statistical approach with long-term trend and short-term pattern. Journal of the American Statistical Association 103 112–125.
  • Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association 85 699–704.
  • Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics 11 95–103.