Electronic Journal of Statistics

Learning mixtures of Bernoulli templates by two-round EM with performance guarantee

Adrian Barbu, Tianfu Wu, and Ying Nian Wu

Full-text: Open access


Dasgupta and Shulman [1] showed that a two-round variant of the EM algorithm can learn mixture of Gaussian distributions with near optimal precision with high probability if the Gaussian distributions are well separated and if the dimension is sufficiently high. In this paper, we generalize their theory to learning mixture of high-dimensional Bernoulli templates. Each template is a binary vector, and a template generates examples by randomly switching its binary components independently with a certain probability. In computer vision applications, a binary vector is a feature map of an image, where each binary component indicates whether a local feature or structure is present or absent within a certain cell of the image domain. A Bernoulli template can be considered as a statistical model for images of objects (or parts of objects) from the same category. We show that the two-round EM algorithm can learn mixture of Bernoulli templates with near optimal precision with high probability, if the Bernoulli templates are sufficiently different and if the number of features is sufficiently high. We illustrate the theoretical results by synthetic and real examples.

Article information

Electron. J. Statist., Volume 8, Number 2 (2014), 3004-3030.

First available in Project Euclid: 15 January 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Clustering performance bounds unsupervised learning


Barbu, Adrian; Wu, Tianfu; Wu, Ying Nian. Learning mixtures of Bernoulli templates by two-round EM with performance guarantee. Electron. J. Statist. 8 (2014), no. 2, 3004--3030. doi:10.1214/14-EJS981. https://projecteuclid.org/euclid.ejs/1421330628

Export citation


  • [1] Dasgupta, S. and Shulman, L. J., A Two-round variant of EM for Gaussian mixtures., Proceedings of 16th Conference on Uncertainty in Artificial Intelligence (UAI -2000), 152–159, 2000.
  • [2] Daugman, J. G., Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression., IEEE Trans. on Acoustics, Speech and Signal Processing, 36, 1169–1179, 1988.
  • [3] Dempster, A. P., Laird, N. M., and Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm (with discussion)., Journal of the Royal Statistical Society, B, 39, 1–38, 1977.
  • [4] Fraley, C. and Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation., Journal of the American Statistical Association, 97, 611–631, 2002.
  • [5] Goodman, L. A., Exploratory latent structure analysis using both identifiable and unidentifiable models., Biometrika, 61, 215–231, 1974.
  • [6] Huo, X. and Donoho, D. L., Applications of beamlets to detection and extraction of lines, curves and objects in very noisy images., Nonlinear Signal and Image Processing, 2001.
  • [7] Schwarz, G. E., Estimating the dimension of a model., Annals of Statistics, 6, 461–464, 1978.
  • [8] Si, Z., Gong, H., Zhu, S. C., and Wu, Y. N., Learning active basis models by EM-type algorithms., Statistical Science, 25, 458–475, 2010.
  • [9] Vapnik, V. N., The Nature of Statistical Learning Theory. Springer, 2000.
  • [10] Zhu, S. C. and Mumford, D. B., A stochastic grammar of images., Foundations and Trends in Computer Graphics and Vision, 2, 259–362, 2006.