Statistical Science

Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models

Abstract

Variational Bayes (VB) is a common strategy for approximate Bayesian inference, but simple methods are only available for specific classes of models including, in particular, representations having conditionally conjugate constructions within an exponential family. Models with logit components are an apparently notable exception to this class, due to the absence of conjugacy among the logistic likelihood and the Gaussian priors for the coefficients in the linear predictor. To facilitate approximate inference within this widely used class of models, Jaakkola and Jordan (Stat. Comput. 10 (2000) 25–37) proposed a simple variational approach which relies on a family of tangent quadratic lower bounds of the logistic log-likelihood, thus restoring conjugacy between these approximate bounds and the Gaussian priors. This strategy is still implemented successfully, but few attempts have been made to formally understand the reasons underlying its excellent performance. Following a review on VB for logistic models, we cover this gap by providing a formal connection between the above bound and a recent Pólya-gamma data augmentation for logistic regression. Such a result places the computational methods associated with the aforementioned bounds within the framework of variational inference for conditionally conjugate exponential family models, thereby allowing recent advances for this class to be inherited also by the methods relying on Jaakkola and Jordan (Stat. Comput. 10 (2000) 25–37).

Article information

Source
Statist. Sci., Volume 34, Number 3 (2019), 472-485.

Dates
First available in Project Euclid: 11 October 2019

https://projecteuclid.org/euclid.ss/1570780980

Digital Object Identifier
doi:10.1214/19-STS712

Citation

Durante, Daniele; Rigon, Tommaso. Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models. Statist. Sci. 34 (2019), no. 3, 472--485. doi:10.1214/19-STS712. https://projecteuclid.org/euclid.ss/1570780980

References

• Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
• Beal, M. J. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In Bayesian Statistics, 7 (Tenerife, 2002) 453–463. Oxford Univ. Press, New York.
• Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York.
• Bishop, C. M. and Svensén, M. (2003). Bayesian hierarchical mixtures of experts. Proc. Conf. Uncertain. Artif. Intell. 57–64.
• Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112 859–877.
• Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
• Böhning, D. and Lindsay, B. G. (1988). Monotonicity of quadratic-approximation algorithms. Ann. Inst. Statist. Math. 40 641–663.
• Braun, M. and McAuliffe, J. (2010). Variational inference for large-scale models of discrete choice. J. Amer. Statist. Assoc. 105 324–335.
• Browne, R. P. and McNicholas, P. D. (2015). Multivariate sharp quadratic bounds via $\boldsymbol{\Sigma}$-strong convexity and the Fenchel connection. Electron. J. Stat. 9 1913–1938.
• Carbonetto, P. and Stephens, M. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7 73–107.
• Choi, H. M. and Hobert, J. P. (2013). The Polya-gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electron. J. Stat. 7 2054–2064.
• de Leeuw, J. and Lange, K. (2009). Sharp quadratic majorization in one dimension. Comput. Statist. Data Anal. 53 2471–2484.
• Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
• Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
• Giordano, R. J., Broderick, T. and Jordan, M. I. (2015). Linear response methods for accurate covariance estimates from mean field variational Bayes. Adv. Neural Inf. Process. Syst. 1441–1449.
• Hoffman, M. D., Blei, D. M., Wang, C. and Paisley, J. (2013). Stochastic variational inference. J. Mach. Learn. Res. 14 1303–1347.
• Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30–37.
• Jaakkola, T. S. and Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Stat. Comput. 10 25–37.
• Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37 183–233.
• Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Stat. 22 79–86.
• Lee, S., Huang, J. Z. and Hu, J. (2010). Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4 1579–1601.
• McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley, New York.
• Ormerod, J. T. and Wand, M. P. (2010). Explaining variational approximations. Amer. Statist. 64 140–153.
• Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108 1339–1349.
• Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA.
• Ren, L., Du, L., Carin, L. and Dunson, D. B. (2011). Logistic stick-breaking process. J. Mach. Learn. Res. 12 203–239.
• Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Stat. 22 400–407.
• Scott, J. G. and Sun, L. (2013). Expectation-maximization for logistic regression. Available at arXiv:1306.0040.
• Spall, J. C. (2003). Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley Interscience, Hoboken, NJ.
• Tang, Y., Browne, R. P. and McNicholas, P. D. (2015). Model based clustering of high-dimensional binary data. Comput. Statist. Data Anal. 87 84–101.
• Wand, M. P. (2017). Fast approximate inference for arbitrarily large semiparametric regression models via message passing. J. Amer. Statist. Assoc. 112 137–156.
• Wand, M. P., Ormerod, J. T., Padoan, S. A. and Frührwirth, R. (2011). Mean field variational Bayes for elaborate distributions. Bayesian Anal. 6 847–900.
• Wang, C. and Blei, D. M. (2013). Variational inference in nonconjugate models. J. Mach. Learn. Res. 14 1005–1031.
• Wang, B. and Titterington, D. M. (2004). Convergence and asymptotic normality of variational Bayesian approximations for exponential family models with missing values. Proc. Conf. Uncertain. Artif. Intell. 577–584.
• Zhu, L. (2012). New inequalities for hyperbolic functions and their applications. J. Inequal. Appl. 303 1–29.