## Bernoulli

• Bernoulli
• Volume 25, Number 4A (2019), 2854-2882.

### High-dimensional Bayesian inference via the unadjusted Langevin algorithm

#### Abstract

We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density w.r.t. the Lebesgue measure on $\mathbb{R}^{d}$, known up to a normalization constant $x\mapsto\pi(x)=\mathrm{e}^{-U(x)}/\int_{\mathbb{R}^{d}}\mathrm{e}^{-U(y)}\,\mathrm{d}y$. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is globally Lipschitz and $U$ is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order $2$ and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims.

#### Article information

Source
Bernoulli, Volume 25, Number 4A (2019), 2854-2882.

Dates
Revised: July 2018
First available in Project Euclid: 13 September 2019

https://projecteuclid.org/euclid.bj/1568362045

Digital Object Identifier
doi:10.3150/18-BEJ1073

Mathematical Reviews number (MathSciNet)
MR4003567

Zentralblatt MATH identifier
07110114

#### Citation

Durmus, Alain; Moulines, Éric. High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli 25 (2019), no. 4A, 2854--2882. doi:10.3150/18-BEJ1073. https://projecteuclid.org/euclid.bj/1568362045

#### References

• [1] Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
• [2] Borodin, A.N. and Salminen, P. (2002). Handbook of Brownian Motion—Facts and Formulae, 2nd ed. Probability and Its Applications. Basel: Birkhäuser.
• [3] Bubeck, S., Eldan, R. and Lehec, J. (2015). Finite-time analysis of projected Langevin Monte Carlo. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 1243–1251. Cambridge, MA, USA: MIT Press.
• [4] Bubley, R., Dyer, M. and Jerrum, M. (1998). An elementary analysis of a procedure for sampling points in a convex body. Random Structures Algorithms 12 213–235.
• [5] Chen, M.F. and Li, S.F. (1989). Coupling methods for multidimensional diffusion processes. Ann. Probab. 17 151–177.
• [6] Choi, H.M. and Hobert, J.P. (2013). The Polya-gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electron. J. Stat. 7 2054–2064.
• [7] Chopin, N. and Ridgway, J. (2017). Leave Pima Indians alone: Binary regression as a benchmark for Bayesian computation. Statist. Sci. 32 64–87.
• [8] Dalalyan, A.S. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Proceedings of the 30th Annual Conference on Learning Theory.
• [9] Dalalyan, A.S. (2017). Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 651–676.
• [10] Durmus, A. and Moulines, É. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab. 27 1551–1587.
• [11] Durmus, A. and Moulines, É. (2019). Supplement to “High-dimensional Bayesian inference via the unadjusted Langevin algorithm.” DOI:10.3150/18-BEJ1073SUPP.
• [12] Eberle, A. Quantitative contraction rates for Markov chains on continuous state spaces. In preparation.
• [13] Eberle, A. (2016). Reflection couplings and contraction rates for diffusions. Probab. Theory Related Fields 166 851–886.
• [14] Eberle, A., Guillin, A. and Zimmer, R. (2018). Quantitative Harris type theorems for diffusions and McKean–Vlasov processes. Trans. Amer. Math. Soc. To appear.
• [15] Ermak, D.L. (1975). A computer simulation of charged particles in solution. I. Technique and equilibrium properties. J. Chem. Phys. 62 4189–4196.
• [16] Faes, C., Ormerod, J.T. and Wand, M.P. (2011). Variational Bayesian inference for parametric and nonparametric regression with missing data. J. Amer. Statist. Assoc. 106 959–971.
• [17] Frühwirth-Schnatter, S. and Frühwirth, R. (2010). Data augmentation and MCMC for binary and multinomial logic models. In Statistical Modelling and Regression Structures 111–132. Heidelberg: Physica-Verlag/Springer.
• [18] Gramacy, R.B. and Polson, N.G. (2012). Simulation-based regularized logistic regression. Bayesian Anal. 7 567–589.
• [19] Grenander, U. (1996). Elements of Pattern Theory. Johns Hopkins Studies in the Mathematical Sciences. Baltimore, MD: Johns Hopkins Univ. Press.
• [20] Grenander, U. and Miller, M.I. (1994). Representations of knowledge in complex systems. J. Roy. Statist. Soc. Ser. B 56 549–603. With discussion and a reply by the authors.
• [21] Hanson, T.E., Branscum, A.J. and Johnson, W.O. (2014). Informative $g$-priors for logistic regression. Bayesian Anal. 9 597–611.
• [22] Holmes, C.C. and Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1 145–168.
• [23] Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38 2418–2442.
• [24] Karatzas, I. and Shreve, S.E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Graduate Texts in Mathematics 113. New York: Springer.
• [25] Klartag, B. (2007). A central limit theorem for convex sets. Invent. Math. 168 91–131.
• [26] Lamberton, D. and Pagès, G. (2002). Recursive computation of the invariant distribution of a diffusion. Bernoulli 8 367–405.
• [27] Lamberton, D. and Pagès, G. (2003). Recursive computation of the invariant distribution of a diffusion: The case of a weakly mean reverting drift. Stoch. Dyn. 3 435–451.
• [28] Lemaire, V. (2005). Estimation de la mesure invariante d’un processus de diffusion. Ph.D. thesis, Université Paris-Est.
• [29] Lindvall, T. and Rogers, L.C.G. (1986). Coupling of multidimensional diffusions by reflection. Ann. Probab. 14 860–872.
• [30] Mattingly, J.C., Stuart, A.M. and Higham, D.J. (2002). Ergodicity for SDEs and approximations: Locally Lipschitz vector fields and degenerate noise. Stochastic Process. Appl. 101 185–232.
• [31] Neal, R.M. (1993). Bayesian learning via stochastic dynamics. In Advances in Neural Information Processing Systems 5, [NIPS Conference] 475–482. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
• [32] Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization 87. Boston, MA: Kluwer Academic.
• [33] Parisi, G. (1981). Correlation functions and computer simulations. Nuclear Phys. B 180 378–384.
• [34] Polson, N.G., Scott, J.G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108 1339–1349.
• [35] Roberts, G.O. and Tweedie, R.L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2 341–363.
• [36] Rossky, P.J., Doll, J.D. and Friedman, H.L. (1978). Brownian dynamics as smart Monte Carlo simulation. J. Chem. Phys. 69 4628–4633.
• [37] Sabanés Bové, D. and Held, L. (2011). Hyper-$g$ priors for generalized linear models. Bayesian Anal. 6 387–410.
• [38] Talay, D. and Tubaro, L. (1990). Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl. 8 483–509.
• [39] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 338. Berlin: Springer.
• [40] Welling, M. and Teh, Y.W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 681–688.
• [41] Windle, J., Polson, N.G. and Scott, J.G. (2013). Bayeslogit: Bayesian logistic regression. R package version 0.2. Available at http://cran.r-project.org/web/packages/BayesLogit/index.html.

#### Supplemental materials

• Supplement to “High-dimensional Bayesian inference via the unadjusted Langevin algorithm”. Most proofs and derivations are postponed and carried out in a supplementary paper.