Diffusion limit for the random walk Metropolis algorithm out of stationarity

Juan Kuntz; Michela Ottobre; Andrew M. Stuart

doi:10.1214/18-AIHP929

August 2019 Diffusion limit for the random walk Metropolis algorithm out of stationarity

Juan Kuntz, Michela Ottobre, Andrew M. Stuart

Ann. Inst. H. Poincaré Probab. Statist. 55(3): 1599-1648 (August 2019). DOI: 10.1214/18-AIHP929

Abstract

The Random Walk Metropolis (RWM) algorithm is a Metropolis–Hastings Markov Chain Monte Carlo algorithm designed to sample from a given target distribution $\pi^{N}$ with Lebesgue density on $\mathbb{R}^{N}$. Like any other Metropolis–Hastings algorithm, RWM constructs a Markov chain by randomly proposing a new position (the “proposal move”), which is then accepted or rejected according to a rule which makes the chain reversible with respect to $\pi^{N}$. When the dimension $N$ is large, a key question is to determine the optimal scaling with $N$ of the proposal variance: if the proposal variance is too large, the algorithm will reject the proposed moves too often; if it is too small, the algorithm will explore the state space too slowly. Determining the optimal scaling of the proposal variance gives a measure of the cost of the algorithm as well. One approach to tackle this issue, which we adopt here, is to derive diffusion limits for the algorithm. Such an approach has been proposed in the seminal papers (Ann. Appl. Probab. 7 (1) (1997) 110–120; J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 (1) (1998) 255–268). In particular, in (Ann. Appl. Probab. 7 (1) (1997) 110–120) the authors derive a diffusion limit for the RWM algorithm under the two following assumptions: (i) the algorithm is started in stationarity; (ii) the target measure $\pi^{N}$ is in product form. The present paper considers the situation of practical interest in which both assumptions (i) and (ii) are removed. That is (a) we study the case (which occurs in practice) in which the algorithm is started out of stationarity and (b) we consider target measures which are in non-product form. Roughly speaking, we consider target measures that admit a density with respect to Gaussian; such measures arise in Bayesian nonparametric statistics and in the study of conditioned diffusions. We prove that, out of stationarity, the optimal scaling for the proposal variance is $O(N^{-1})$, as it is in stationarity. In this optimal scaling, a diffusion limit is obtained and the cost of reaching and exploring the invariant measure scales as $O(N)$. Notice that the optimal scaling in and out of stationatity need not be the same in general, and indeed they differ e.g. in the case of the MALA algorithm (Stoch. Partial Differ. Equ. Anal Comput. 6 (3) (2018) 446–499). More importantly, our diffusion limit is given by a stochastic PDE, coupled to a scalar ordinary differential equation; such an ODE gives a measure of how far from stationarity the process is and can therefore be taken as an indicator of convergence. In this sense, this paper contributes understanding to the old-standing problem of monitoring convergence of MCMC algorithms.

L’algorithme Random Walk Metropolis (RWM) est un algorithme de Markov Chain Monte Carlo de type Metropolis–Hastings, conçu pour échantillonner une variable aléatoire de loi cible $\pi^{N}$ ayant une densité par rapport à la mesure de Lebesgue sur $\mathbb{R}^{N}$. Comme tout algorithme de Metropolis–Hastings, RWM construit une chaîne de Markov en proposant une nouvelle position au hasard (le « pas proposé »), qui est ensuit accepté ou rejeté selon une règle choisie de sorte à rendre la chaîne réversible par rapport à $\pi^{N}$. Lorsque la dimension $N$ est grande, une question cruciale est de déterminer l’échelle optimale (dépendant de $N$) de la variance du pas proposé : si cette variance est trop grande, l’algorithme rejettera les pas proposés trop souvent ; si elle est top petite, l’algorithme explorera l’espace d’états trop lentement. Déterminer l’échelle optimale de la variance donne également une mesure du coût de l’algorithme. Notre approche à ce problème est de déterminer des limites de diffusion pour l’algorithme. Une telle approche a été proposée dans les travaux fondateurs (Ann. Appl. Probab. 7 (1) (1997) 110–120; J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 (1) (1998) 255–268); en particulier, dans (Ann. Appl. Probab. 7 (1) (1997) 110–120), les auteurs déterminent une limite de diffusion pour l’algorithme RWM en supposant : (i) que l’algorithme démarre de la mesure stationnaire ; (ii) que la mesure cible $\pi^{N}$ ait une forme produit. Le présent travail étudie la situation d’intérêt pratique où ces deux suppositions n’ont pas lieu. Ainsi (a) nous étudions le cas (qui a lieu en pratique) où l’algorithme commence dans un état non-stationnaire, et (b) nous considérons des mesures cibles qui n’ont ps une forme produit : en gros, les mesures que nous considérons ont une densité par rapport à la mesure gaussienne, et qui interviennent en statistique bayesienne non-paramétrique et dans l’étude des diffusions conditionnées. Nous montrons que, dans l’état non-stationnaire, l’échelle optimale de la variance du pas proposé est $O(N^{-1})$, c’est-à-dire la même que dans l’état stationnaire. À cette échelle optimale, nous obtenons une limite de diffusion et le coût pour atteindre et explorer la mesure invariante est d’ordre $O(N)$. Notons que les échelles optimales dans les cas stationnaires et non-stationnaires ne sont en générales pas les mêmes, et diffèrent par exemple dans le cas de l’algorithme MALA (Stoch. Partial Differ. Equ. Anal Comput. 6 (3) (2018) 446–499). De façon plus importante, notre limite de diffusion est donnée par une EDP stochastique couplée à une équation différentielle ordinaire scalaire. Une telle équation donne une mesure de la distance du processus à l’état stationnaire, et peut donc être vue comme un indicateur de convergence. En ce sens, ce travail contribue à comprendre le problème ancien de contrôler la convergence des algorithmes MCMC.

Citation

Download Citation

Juan Kuntz. Michela Ottobre. Andrew M. Stuart. "Diffusion limit for the random walk Metropolis algorithm out of stationarity." Ann. Inst. H. Poincaré Probab. Statist. 55 (3) 1599 - 1648, August 2019. https://doi.org/10.1214/18-AIHP929