Open Access
February 2013 Convergence rate of Markov chain methods for genomic motif discovery
Dawn B. Woodard, Jeffrey S. Rosenthal
Ann. Statist. 41(1): 91-124 (February 2013). DOI: 10.1214/12-AOS1075

Abstract

We analyze the convergence rate of a simplified version of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of the DNA sequence. Specifically, we show that this occurs whenever there is more than one true repeating pattern in the data. In practice there are typically multiple such patterns in biological data, the goal being to detect the most well-conserved and frequently-occurring of these. Our findings match empirical results, in which the motif-discovery Gibbs sampler has exhibited such poor convergence that it is used only for finding modes of the posterior distribution (candidate motifs) rather than for obtaining samples from that distribution. Ours are some of the first meaningful bounds on the convergence rate of a Markov chain method for sampling from a multimodal posterior distribution, as a function of statistical quantities like the number of observations.

Citation

Download Citation

Dawn B. Woodard. Jeffrey S. Rosenthal. "Convergence rate of Markov chain methods for genomic motif discovery." Ann. Statist. 41 (1) 91 - 124, February 2013. https://doi.org/10.1214/12-AOS1075

Information

Published: February 2013
First available in Project Euclid: 5 March 2013

zbMATH: 1347.62048
MathSciNet: MR3059411
Digital Object Identifier: 10.1214/12-AOS1075

Subjects:
Primary: 62F15
Secondary: 60J10

Keywords: DNA , Gibbs sampler , multimodal , slow mixing , spectral gap

Rights: Copyright © 2013 Institute of Mathematical Statistics

Vol.41 • No. 1 • February 2013
Back to Top