Open Access
February 2017 Statistical guarantees for the EM algorithm: From population to sample-based analysis
Sivaraman Balakrishnan, Martin J. Wainwright, Bin Yu
Ann. Statist. 45(1): 77-120 (February 2017). DOI: 10.1214/16-AOS1435


The EM algorithm is a widely used tool in maximum-likelihood estimation in incomplete data problems. Existing theoretical work has focused on conditions under which the iterates or likelihood values converge, and the associated rates of convergence. Such guarantees do not distinguish whether the ultimate fixed point is a near global optimum or a bad local optimum of the sample likelihood, nor do they relate the obtained fixed point to the global optima of the idealized population likelihood (obtained in the limit of infinite data). This paper develops a theoretical framework for quantifying when and how quickly EM-type iterates converge to a small neighborhood of a given global optimum of the population likelihood. For correctly specified models, such a characterization yields rigorous guarantees on the performance of certain two-stage estimators in which a suitable initial pilot estimator is refined with iterations of the EM algorithm. Our analysis is divided into two parts: a treatment of the EM and first-order EM algorithms at the population level, followed by results that apply to these algorithms on a finite set of samples. Our conditions allow for a characterization of the region of convergence of EM-type iterates to a given population fixed point, that is, the region of the parameter space over which convergence is guaranteed to a point within a small neighborhood of the specified population fixed point. We verify our conditions and give tight characterizations of the region of convergence for three canonical problems of interest: symmetric mixture of two Gaussians, symmetric mixture of two regressions and linear regression with covariates missing completely at random.


Download Citation

Sivaraman Balakrishnan. Martin J. Wainwright. Bin Yu. "Statistical guarantees for the EM algorithm: From population to sample-based analysis." Ann. Statist. 45 (1) 77 - 120, February 2017.


Received: 1 September 2015; Revised: 1 January 2016; Published: February 2017
First available in Project Euclid: 21 February 2017

MathSciNet: MR3611487
zbMATH: 1367.62052
Digital Object Identifier: 10.1214/16-AOS1435

Primary: 60K35 , 62F10
Secondary: 90C30

Keywords: EM algorithm , first-order EM algorithm , maximum likelihood estimation , nonconvex optimization

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.45 • No. 1 • February 2017
Back to Top