Electronic Journal of Statistics

Variational Bayesian inference with Gaussian-mixture approximations

O. Zobay

Full-text: Open access


Variational Bayesian inference with a Gaussian posterior approximation provides an alternative to the more commonly employed factorization approach and enlarges the range of tractable distributions. In this paper, we propose an extension to the Gaussian approach which uses Gaussian mixtures as approximations. A general problem for variational inference with mixtures is posed by the calculation of the entropy term in the Kullback-Leibler distance, which becomes analytically intractable. We deal with this problem by using a simple lower bound for the entropy and imposing restrictions on the form of the Gaussian covariance matrix. In this way, efficient numerical calculations become possible. To illustrate the method, we discuss its application to an isotropic generalized normal target density, a non-Gaussian state space model, and the Bayesian lasso. For heavy-tailed distributions, the examples show that the mixture approach indeed leads to improved approximations in the sense of a reduced Kullback-Leibler distance. From a more practical point of view, mixtures can improve estimates of posterior marginal variances. Furthermore, they provide an initial estimate of posterior skewness which is not possible with single Gaussians. We also discuss general sufficient conditions under which mixtures are guaranteed to provide improvements over single-component approximations.

Article information

Electron. J. Statist., Volume 8, Number 1 (2014), 355-389.

First available in Project Euclid: 18 April 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference
Secondary: 62E17: Approximations to distributions (nonasymptotic)

Approximation methods variational inference normal mixtures Bayesian lasso state-space models


Zobay, O. Variational Bayesian inference with Gaussian-mixture approximations. Electron. J. Statist. 8 (2014), no. 1, 355--389. doi:10.1214/14-EJS887. https://projecteuclid.org/euclid.ejs/1397826705

Export citation


  • Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, New York.
  • Chen, J.-Y., Hershey, J. R., Olsen, P. A., and Yashchin, E. (2008). Accelerated Monte Carlo for Kullback-Leibler divergence between Gaussian mixture models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4553–4556.
  • Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. Ann. Stat. 32, 407–499.
  • Gilks, W. R., Best, N. G., and Tan, K. K. C. (1995). Adaptive rejection Metropolis sampling within Gibbs sampling. Appl. Statist. 21, 455–472.
  • Goldberger, J., Gordon, S., and Greenspan, H. (2003). An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures. In Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV’03), 487–493.
  • Hans, C. (2009). Bayesian lasso regression. Biometrika 96, 835–845.
  • Hans, C. (2010). Model uncertainty and variable selection in Bayesian lasso regression. Statistics and Computing 20, 221–229.
  • Hershey, J. R. and Olsen, P. A. (2007). Approximating the Kullback-Leibler divergence between Gaussian mixture models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2007, IV-317–IV-320.
  • Huber, M. F., Bailey, T., Durrant-Whyte, H., and Hanebeck, U. D. (2008). On entropy approximation for Gaussian mixture random vectors. In Proceedings of the EEE International Conference on Multisensor Fusion and Integration for Intelligent Systems 2008, 181–188.
  • Jaakkola, T. S. and Jordan, M. I. (1998). Improving the mean field approximation via the use of mixture distributions. In Learning in Graphical Models, ed. M. I. Jordan, MIT Press, Cambridge, MA, 163–174.
  • MacKay, D. J. C. (2003). Information theory, inference and learning algorithms. Cambridge University Press, New York.
  • Minka, T. (2005). Divergence measures and message passing. Microsoft Technical Report MSR-TR-2005-173.
  • Opper, M. and Archambeau, C. (2009). The variational Gaussian approximation revisited. Neural Computation 21, 786–792.
  • Opper, M. and Saad, D. (eds.) (2001). Advanced mean field methods: theory and practice. Neural Information Processing Series. MIT Press, Cambridge, MA.
  • Ormerod, J. T. and Wand, M. P. (2010). Explaining variational approximations. The American Statistician 64, 140–153.
  • Park, T. and Casella, G.(2008). The Bayesian Lasso. Journal of the American Statistical Association 103, 681–686.
  • Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P.(2007). Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge.
  • Smidl, V. and Quinn, A. (2006). The Variational Bayes Method in Signal Processing. Springer, Berlin, Heidelberg, New York.
  • Tibshirani, R.(1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–288.
  • Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learning 1, 1–305.
  • Zhang, S. and Jin, J. (1996). Computation of Special Functions. Wiley, New York.
  • Zobay, O. (2009). Mean field inference for the Dirichlet process mixture model. Electronic Journal Statistics 3, 507–545.