The Annals of Statistics

The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data

Joris Bierkens, Paul Fearnhead, and Gareth Roberts

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use sub-sampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multidimensional version of the Zig-Zag process of [Ann. Appl. Probab. 27 (2017) 846–882], a continuous-time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction (a property which is known to inhibit rapid convergence) the Zig-Zag process offers a flexible nonreversible alternative which we observe to often have favourable convergence properties. We show how the Zig-Zag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a sub-sampling version of the Zig-Zag process that is an example of an exact approximate scheme, that is, the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a control-variate idea to reduce the variance of our unbiased estimator, then the Zig-Zag process can be super-efficient: after an initial preprocessing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.

Article information

Ann. Statist., Volume 47, Number 3 (2019), 1288-1320.

Received: July 2016
Revised: March 2018
First available in Project Euclid: 13 February 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 65C60: Computational problems in statistics
Secondary: 65C05: Monte Carlo methods 62F15: Bayesian inference 60J25: Continuous-time Markov processes on general state spaces

MCMC nonreversible Markov process piecewise deterministic Markov process stochastic gradient Langevin dynamics sub-sampling exact sampling


Bierkens, Joris; Fearnhead, Paul; Roberts, Gareth. The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist. 47 (2019), no. 3, 1288--1320. doi:10.1214/18-AOS1715.

Export citation


  • Anderson, D. F. (2007). A modified next reaction method for simulating chemical systems with time dependent propensities and delays. J. Chem. Phys. 127 214107.
  • Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. 37 697–725.
  • Bardenet, R., Doucet, A. and Holmes, C. (2017). On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18 1515–1557.
  • Bierkens, J. (2016). Non-reversible Metropolis–Hastings. Stat. Comput. 26 1213–1228.
  • Bierkens, J. (2017). Computer experiments accompanying J. Bierkens, P. Fearnhead and G. Roberts, the Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. Available at Date accessed: 20-10-2017.
  • Bierkens, J., Fearnhead, P. and Roberts, G. (2018). Supplement to “The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data.” DOI:10.1214/18-AOS1715SUPP.
  • Bierkens, J. and Roberts, G. (2017). A piecewise deterministic scaling limit of lifted Metropolis–Hastings in the Curie–Weiss model. Ann. Appl. Probab. 27 846–882.
  • Bierkens, J., Roberts, G. O. and Zitt, P.-A. (2017). Ergodicity of the zigzag process. Preprint. Available at arXiv:1712.09875.
  • Bouchard-Côté, A., Vollmer, S. J. and Doucet, A. (2017). The bouncy particle sampler: A non-reversible rejection-free Markov chain Monte Carlo method. J. Amer. Statist. Assoc. To appear. Available at arXiv:1510.02451.
  • Chen, T.-L. and Hwang, C.-R. (2013). Accelerating reversible Markov chains. Statist. Probab. Lett. 83 1956–1962.
  • Deligiannidis, G., Bouchard-Côté, A. and Doucet, A. (2017). Exponential ergodicity of the bouncy particle sampler. Preprint. Available at arXiv:1705.04579.
  • Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195 216–222.
  • Dubey, K. A., Reddi, S. J., Williamson, S. A., Poczos, B., Smola, A. J. and Xing, E. P. (2016). Variance reduction in stochastic gradient Langevin dynamics. In Advances in Neural Information Processing Systems 1154–1162.
  • Duncan, A. B., Lelièvre, T. and Pavliotis, G. A. (2016). Variance reduction using nonreversible Langevin samplers. J. Stat. Phys. 163 457–491.
  • Fearnhead, P., Bierkens, J., Pollock, M. and Roberts, G. O. (2018). Piecewise deterministic Markov processes for continuous-time Monte Carlo. Statist. Sci. 33 386–412.
  • Fontbona, J., Guérin, H. and Malrieu, F. (2012). Quantitative estimates for the long-time behavior of an ergodic variant of the telegraph process. Adv. in Appl. Probab. 44 977–994.
  • Fontbona, J., Guérin, H. and Malrieu, F. (2016). Long time behavior of telegraph processes under convex potentials. Stochastic Process. Appl. 126 3077–3101.
  • Gibson, M. A. and Bruck, J. (2000). Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. A 104 1876–1889.
  • Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97–109.
  • Hwang, C.-R., Hwang-Ma, S.-Y. and Sheu, S. J. (1993). Accelerating Gaussian diffusions. Ann. Appl. Probab. 3 897–913.
  • Jacob, P. E. and Thiery, A. H. (2015). On nonnegative unbiased estimators. Ann. Statist. 43 769–784.
  • Johnson, R. A. (1970). Asymptotic expansions associated with posterior distributions. Ann. Math. Stat. 41 851–864.
  • Lewis, P. A. W. and Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Nav. Res. Logist. Q. 26 403–413.
  • Li, C., Srivastava, S. and Dunson, D. B. (2017). Simple, scalable and accurate posterior interval estimation. Biometrika 104 665–680.
  • Ma, Y.-A., Chen, T. and Fox, E. (2015). A complete recipe for stochastic gradient MCMC. In Advances in Neural Information Processing Systems 2917–2925.
  • Maclaurin, D. and Adams, R. P. (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence AUAI Press, Arlington, VA.
  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21 1087.
  • Monmarché, P. (2014). Hypocoercive relaxation to equilibrium for some kinetic models via a third order differential inequality. Available at arXiv:1306.4548.
  • Neal, R. M. (1998). Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. In Learning in Graphical Models 205–228. Springer, Berlin.
  • Neiswanger, W., Wang, C. and Xing, E. (2013). Asymptotically exact, embarrassingly parallel MCMC. Available at arXiv:1311.4780.
  • Pakman, A., Gilboa, D., Carlson, D. and Paninski, L. (2016). Stochastic bouncy particle sampler. Preprint. Available at arXiv:1609.00770.
  • Peters, E. A. J. F. and De With, G. (2012). Rejection-free Monte Carlo sampling for general potentials. Phys. Rev. E (3) 85 1–5.
  • Pollock, M., Fearnhead, P., Johansen, A. M. and Roberts, G. O. (2016). The scalable Langevin exact algorithm: Bayesian inference for big data. Available at arXiv:1609.03436.
  • Quiroz, M., Villani, M. and Kohn, R. (2015). Speeding up MCMC by efficient data subsampling. Riksbank Research Paper Series 121.
  • Rey-Bellet, L. and Spiliopoulos, K. (2015). Irreversible Langevin samplers and variance reduction: A large deviations approach. Nonlinearity 28 2081–2103.
  • Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2 341–363.
  • Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I. and McCullogh, R. E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11 78–88.
  • Turitsyn, K. S., Chertkov, M. and Vucelja, M. (2011). Irreversible Monte Carlo algorithms for efficient sampling. Phys. D 240 410–414.
  • Vollmer, S. J., Zygalakis, K. C. and Teh, Y. W. (2016). Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics. J. Mach. Learn. Res. 17 1–48.
  • Wang, X. and Dunson, D. B. (2013). Parallelizing MCMC via Weierstrass sampler. Available at arXiv:1312.4605.
  • Welling, M. and Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 681–688.

Supplemental materials

  • Supplement to “The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data”. Mathematics of the Zig-Zag process, scaling of SGLD, details on the experiments including how to obtain computational bounds.