Open Access
December 2009 Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times
Yonil Park, Sergey Sheetlin, John L. Spouge
Ann. Statist. 37(6A): 3697-3714 (December 2009). DOI: 10.1214/08-AOS663


The gapped local alignment score of two random sequences follows a Gumbel distribution. If computers could estimate the parameters of the Gumbel distribution within one second, the use of arbitrary alignment scoring schemes could increase the sensitivity of searching biological sequence databases over the web. Accordingly, this article gives a novel equation for the scale parameter of the relevant Gumbel distribution. We speculate that the equation is exact, although present numerical evidence is limited. The equation involves ascending ladder variates in the global alignment of random sequences. In global alignment simulations, the ladder variates yield stopping times specifying random sequence lengths. Because of the random lengths, and because our trial distribution for importance sampling occurs on a different sample space from our target distribution, our study led to a mapping theorem, which led naturally in turn to an efficient dynamic programming algorithm for the importance sampling weights. Numerical studies using several popular alignment scoring schemes then examined the efficiency and accuracy of the resulting simulations.


Download Citation

Yonil Park. Sergey Sheetlin. John L. Spouge. "Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times." Ann. Statist. 37 (6A) 3697 - 3714, December 2009.


Published: December 2009
First available in Project Euclid: 17 August 2009

zbMATH: 1369.62255
MathSciNet: MR2549575
Digital Object Identifier: 10.1214/08-AOS663

Primary: 62M99
Secondary: 92-08

Keywords: gapped sequence alignment , Gumbel scale parameter estimation , importance sampling , Markov additive process , Markov renewal process , stopping time

Rights: Copyright © 2009 Institute of Mathematical Statistics

Vol.37 • No. 6A • December 2009
Back to Top