Bayesian Analysis

Variational Hamiltonian Monte Carlo via Score Matching

Cheng Zhang, Babak Shahbaba, and Hongkai Zhao

Full-text: Open access


Traditionally, the field of computational Bayesian statistics has been divided into two main subfields: variational methods and Markov chain Monte Carlo (MCMC). In recent years, however, several methods have been proposed based on combining variational Bayesian inference and MCMC simulation in order to improve their overall accuracy and computational efficiency. This marriage of fast evaluation and flexible approximation provides a promising means of designing scalable Bayesian inference methods. In this paper, we explore the possibility of incorporating variational approximation into a state-of-the-art MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required expensive computation involved in the sampling procedure, which is the bottleneck for many applications of HMC in big data problems. To this end, we exploit the regularity in parameter space to construct a free-form approximation of the target distribution by a fast and flexible surrogate function using an optimized additive model of proper random basis, which can also be viewed as a single-hidden layer feedforward neural network. The surrogate function provides sufficiently accurate approximation while allowing for fast computation in the sampling procedure, resulting in an efficient approximate Bayesian inference algorithm. We demonstrate the advantages of our proposed method using both synthetic and real data problems.

Article information

Bayesian Anal., Volume 13, Number 2 (2018), 485-506.

First available in Project Euclid: 25 July 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 65C60: Computational problems in statistics
Secondary: 65C05: Monte Carlo methods

Markov Chain Monte Carlo variational inference free-form approximation

Creative Commons Attribution 4.0 International License.


Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai. Variational Hamiltonian Monte Carlo via Score Matching. Bayesian Anal. 13 (2018), no. 2, 485--506. doi:10.1214/17-BA1060.

Export citation


  • Ahn, S., Korattikara, A., and Welling, M. (2012). “Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In Langford, J. and Pineau, J. (eds.), Proceedings of the 29th International Conference on Machine Learning (ICML-12), 1591–1598. New York, NY, USA: Omnipress.
  • Albert, J. (2009). Bayesian Computing with R. Springer Science, New York.
  • Amari, S. I., Cichocki, A., and Yang, H. H. (1996). “A New Learning Algorithm for Blind Signal Separation.” In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E. (eds.), Advances in Neural Information Processing Systems 8, 757–763. MIT Press.
  • Beal, M. J. and Ghahramani, Z. (2002). “The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures.” In Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 7: Proceedings of the 7th Valencia International Meeting, 453–463. Oxford University Press, Oxford.
  • Betancourt, M. (2015). “The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling.” In Bach, F. and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37, 533–540. Lille, France: PMLR.
  • Chen, T., Fox, E., and Guestrin, C. (2014). “Stochastic Gradient Hamiltonian Monte Carlo.” In Xing, E. P. and Jebara, T. (eds.), Proceedings of the 31st International Conference on Machine Learning, volume 32, 1683–1691. Bejing, China: PMLR.
  • de Freitas, N., Højen-Sørensen, P., Jordan, M. I., and Russell, S. (2001). “Variational MCMC.” In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 120–127. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
  • Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., and Neven, H. (2014). “Bayesian Sampling Using Stochastic Gradient Thermostats.” In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems 27, 3203–3211. Curran Associates, Inc.
  • Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. (1987). “Hybrid Monte Carlo.” Physics Letters B, 195(2): 216–222.
  • Ferrari, S. and Stengel, R. F. (2005). “Smooth function approximation using neural networks.” IEEE Transactions on Neural Networks, 16(1): 24–38.
  • Geyer, C. J. (1992). “Practical Markov Chain Monte Carlo.” Statistical Science, 7: 473–483.
  • Girolami, M. and Calderhead, B. (2011). “Riemann manifold Langevin and Hamiltonian Monte Carlo methods (with discussion).” Journal of the Royal Statistical Society, 73(2): 123–214.
  • Hoffman, M. D. and Gelman, A. (2014). “The No-U-turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15(1): 1593–1623.
  • Honkela, A., Raiko, T., Kuusela, M., Tornio, M., and Karhunen, J. (2010). “Approximate Riemannian conjugate gradient learning for fixed-form variational Bayes.” Journal of Machine Learning Research, 11: 3235–3268.
  • Huang, G. B., Chen, L., and Siew, C. K. (2006a). “Universal approximation using incremental constructive feedforward networks with random hidden nodes.” IEEE Transactions on Neural Networks, 17(4): 879–892.
  • Huang, G. B., Zhu, Q. Y., and Siew, C. K. (2006b). “Extreme learning machine: Theory and applications.” Neurocomputing, 70(1–3): 489–501.
  • Hyvärinen, A. (2005). “Estimation of non-normalized statistical models by score matching.” Journal of Machine Learning Research, 6: 695–709.
  • Hyvärinen, A. and Oja, E. (2000). “Independent component analysis: algorithms and applications.” Neural Networks, 13: 411–430.
  • Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999). “An Introduction to Variational Methods for Graphical Methods.” In Machine Learning, 183–233. MIT Press.
  • Kingma, D. P. and Welling, M. (2013). “Auto-Encoding Variational Bayes.” In The 2nd International Conference on Learning Representations (ICLR).
  • Lin, C. J., Weng, R. C., and Keerthi, S. S. (2008). “Trust region Newton method for large-scale logistic regression.” Journal of Machine Learning Research, 9: 627–650.
  • Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer.
  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). “Equation of State Calculations by Fast Computing Machines.” The Journal of Chemical Physics, 21(6): 1087–1092.
  • Neal, R. M. (2011). “MCMC using Hamiltonian dynamics.” In Brooks, S., Gelman, A., Jones, G., and Meng, X. L. (eds.), Handbook of Markov Chain Monte Carlo, 113–162. Chapman and Hall/CRC.
  • Ormerod, J. T. and Wand, M. P. (2010). “Explaining Variational Approximations.” The American Statistician, 2(64): 140–153.
  • Paisley, J., Blei, D., and Jordan, M. (2012). “Variational Bayesian Inference with Stochastic Search.” In Langford, J. and Pineau, J. (eds.), Proceedings of the 29th International Conference on Machine Learning (ICML-12), 1367–1374. New York, NY, USA: Omnipress.
  • Quinonero-Candela, J. and Rasmussen, C. E. (2005). “A unifying view of sparse approximate Gaussian process regression.” Journal of Machine Learning Research, 6: 1939–1959.
  • Rahimi, A. and Recht, B. (2008). “Uniform approximation of functions with random bases.” In 46th Annual Allerton Conference on Communication, Control, and Computing, 555–561.
  • Ranganath, R., Gerrish, S., and Blei, D. M. (2014). “Black Box Variational Inference.” In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, 814–822.
  • Rasmussen, C. E. (2003). “Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals.” Bayesian Statistics, 7: 651–659.
  • Salimans, T., Kingma, D., and Welling, M. (2015). “Markov Chain Monte Carlo and Variational Inference: Bridging the Gap.” In Bach, F. and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37, 1218–1226. PMLR.
  • Salimans, T. and Knowles, D. A. (2013). “Fixed-form variational posterior approximation through stochastic linear regression.” Bayesian Analysis, 8(4): 837–882.
  • Saul, L. and Jordan, M. I. (1996). “Exploiting tractable substructures in intractable networks.” In Tesauro, G., Touretzky, D. S., and Leen, T. K. (eds.), Advance in neural information processing systems 7 (NIPS 1996), 486–492. Cambridge, MA: MIT Press.
  • Snelson, E. and Ghahramani, Z. (2006). “Sparse Gaussian Processes using Pseudo-inputs.” In Weiss, Y., Schölkopf, P. B., and Platt, J. C. (eds.), Advances in Neural Information Processing Systems 18, 1257–1264. MIT Press.
  • Vigário, R., Särelä, J., and Oja, E. (1997). “MEG data for studies using independent component analysis.”
  • Wainwright, M. and Jordan, M. (2008). “Graphical models, exponential families, and variational inference.” Foundations and Trends in Machine Learning, 1(1–2): 1–305.
  • Wang, Z., Mohamed, S., and Freitas, N. (2013). “Adaptive Hamiltonian and Riemann Manifold Monte Carlo.” In Dasgupta, S. and McAllester, D. (eds.), Proceedings of the 30th International Conference on Machine Learning, volume 28, 1462–1470. Atlanta, Georgia, USA: PMLR.
  • Welling, M. and Teh, Y. W. (2011). “Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In Getoor, L. and Scheffer, T. (eds.), Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681–688. New York, NY, USA: ACM.
  • Zhang, C., Shahbaba, B., and Zhao, H. (2016). “Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.” Statistics and Computing, 1–18.
  • Zhang, C., Shahbaba, B., and Zhao, H. (2017). “Variational Hamiltonian Monte Carlo via Score Matching – Supplementary Materials.” Bayesian Analysis.

Supplemental materials