Electronic Journal of Statistics

Higher order Langevin Monte Carlo algorithm

Sotirios Sabanis and Ying Zhang

Full-text: Open access

Abstract

A new (unadjusted) Langevin Monte Carlo (LMC) algorithm with improved rates in total variation and in Wasserstein distance is presented. All these are obtained in the context of sampling from a target distribution $\pi$ that has a density $\hat{\pi}$ on $\mathbb{R}^{d}$ known up to a normalizing constant. Moreover, $-\log\hat{\pi}$ is assumed to have a locally Lipschitz gradient and its third derivative is locally Hölder continuous with exponent $\beta \in (0,1]$. Non-asymptotic bounds are obtained for the convergence to stationarity of the new sampling method with convergence rate $1+\beta/2$ in Wasserstein distance, while it is shown that the rate is 1 in total variation even in the absence of convexity. Finally, in the case where $-\log \hat{\pi}$ is strongly convex and its gradient is Lipschitz continuous, explicit constants are provided.

Article information

Source
Electron. J. Statist., Volume 13, Number 2 (2019), 3805-3850.

Dates
Received: November 2018
First available in Project Euclid: 3 October 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1570068044

Digital Object Identifier
doi:10.1214/19-EJS1615

Mathematical Reviews number (MathSciNet)
MR4015336

Zentralblatt MATH identifier
07113731

Subjects
Primary: 62L10: Sequential analysis 65C05: Monte Carlo methods

Keywords
Markov chain Monte Carlo higher order algorithm rate of convergence machine learning sampling problem super-linear coefficients

Rights
Creative Commons Attribution 4.0 International License.

Citation

Sabanis, Sotirios; Zhang, Ying. Higher order Langevin Monte Carlo algorithm. Electron. J. Statist. 13 (2019), no. 2, 3805--3850. doi:10.1214/19-EJS1615. https://projecteuclid.org/euclid.ejs/1570068044


Export citation

References

  • [1] N. Brosse, A. Durmus, E. Moulines and S. Sabanis (2019). The tamed unadjusted Langevin algorithm., Stochastic Processes and their Applications. 129 (10), 3638–3663.
  • [2] D. Bakry, I. Gentil and M. Ledoux (2014)., Analysis and Geometry of Markov Diffusion Operators. 348. Springer International Publishing.
  • [3] W.-J. Beyn, E. Isaak and R. Kruse (2017). Stochastic C-stability and B-consistency of explicit and implicit Milstein-type schemes., Journal of Scientific Computing, 70(3), 1042–1077.
  • [4] A. Dalalyan (2016). Theoretical guarantees for approximate sampling from smooth and log-concave densities., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 651–676.
  • [5] A. Dalalyan and A. Karagulyan (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. In Press, Stochastic Processes and their Applications.
  • [6] A. Durmus and E. Moulines (2016). High-dimensional Bayesian inference via the unadjusted Langevin algorithm., arXiv:1605.01559.
  • [7] A. Durmus and E. Moulines (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm., The Annals of Applied Probability, 27(3), 1551–1587.
  • [8] A. Griewank (1993). Some bounds on the complexity of gradients, Jacobians, and Hessians., Complexity in Numerical Optimization, 128–161.
  • [9] A. Griewank, J. Utke and A. Walther (2000). Evaluating higher derivative tensors by forward propagation of univariate Taylor series., Mathematics of Computation, 69, 1117–1130.
  • [10] A. Griewank and A. Walther (2008)., Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, PA, SIAM.
  • [11] P.O. Hoyer (2004). Non-negative matrix factorization with sparseness constraints., Journal of Machine Learning Research 5(Nov), 1457–1469.
  • [12] M. Hutzenthaler, A. Jentzen and P. E. Kloeden (2011). Strong and weak divergence in finite time of Euler’s method for stochastic differential equations with non-globally Lipschitz continuous coefficients., Proceedings of the Royal Society A, 467, 1563–1576.
  • [13] M. Hutzenthaler, A. Jentzen and P. E. Kloeden (2012). Strong convergence of an explicit numerical method for SDEs with nonglobally Lipschitz continuous coefficients., The Annals of Applied Probability, 22(4), 1611–1641.
  • [14] S. Kullback (1997)., Information Theory and Statistics. Mineola, N.Y., Dover Publications.
  • [15] P. E. Kloeden and E. Platen (2011)., Numerical Solution of Stochastic Differential Equations. Springer Berlin Heidelberg.
  • [16] C. Kumar and S. Sabanis (2019). On Milstein approximations with varying coefficients: the case of super-linear diffusion coefficients. In Press, BIT Numerical Mathematics.
  • [17] R. Liptser and A. N. Shiryaev (2001)., Statistics of Random Processes: I. General Theory. 5. Springer-Verlag Berlin Heidelberg.
  • [18] M. Ledoux (1999). Concentration of measure and logarithmic Sobolev inequalities., Séminaire de Probabilités XXXIII, Lecture Notes in Math., 1709, 120–216.
  • [19] J. Mairal, F. Bach, J. Ponce and G. Sapiro (2010). Online learning for matrix factorization and sparse coding., Journal of Machine Learning Research, 11(Jan), 19–60.
  • [20] J. C. Mattingly, A. M. Stuart, and D. J. Higham (2002). Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise., Stochastic Processes and Their Applications, 101(2), 185–232.
  • [21] S. P. Meyn and R. L. Tweedie (1993). Stability of Markovian processes III: Foster-Lyapunov criteria for continuous-time processes., Advances in Applied Probability, 25(3), 518–548.
  • [22] S. P. Meyn and R. L. Tweedie (2009)., Markov Chains and Stochastic Stability. Cambridge University Press.
  • [23] Y. Nesterov., Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Springer, 2004.
  • [24] E. Platen and W. Wagner (1982). On a Taylor formula for a class of Itô processes., Probab. Math. Statist. 3(1), 37–51.
  • [25] G. O. Roberts and R. L. Tweedie (1996). Exponential convergence of Langevin distributions and their discrete approximations., Bernoulli, 2(4), 341–363.
  • [26] S. Sabanis (2013). A note on tamed Euler approximations., Electron. Commun. in Probab, 18(47), 1–10.
  • [27] S. Sabanis (2016). Euler approximations with varying coefficients: the case of superlinearly growing diffusion coefficients., Annals of Applied Probability, 26(4), 2083–2105.
  • [28] S. Sabanis and Y. Zhang (2019). On explicit order 1.5 approximations with varying coefficients: the case of super-linear diffusion coefficients., Journal of Complexity, 50, 84–115.
  • [29] C. Villani (2009)., Optimal Transport: Old and New. Springer-Verlag Berlin Heidelberg.
  • [30] X. Wang and S. Gan (2013). The tamed Milstein method for commutative stochastic differential equations with non-globally Lipschitz continuous coefficients., Journal of Difference Equations and Applications, 19(3), 466–490.