Open Access
February, 1994 A Phase Transition for the Score in Matching Random Sequences Allowing Deletions
Richard Arratia, Michael S. Waterman
Ann. Appl. Probab. 4(1): 200-225 (February, 1994). DOI: 10.1214/aoap/1177005208

Abstract

We consider a sequence matching problem involving the optimal alignment score for contiguous subsequences, rewarding matches and penalizing for deletions and mismatches. This score is used by biologists comparing pairs of DNA or protein sequences. We prove that for two sequences of length $n$, as $n \rightarrow \infty$, there is a phase transition between linear growth in $n$, when the penalty parameters are small, and logarithmic growth in $n$, when the penalties are large. The results are valid for independent sequences with iid or Markov letters. The crucial step in proving this is to derive a large deviation result for matching with deletions. The longest common subsequence problem of Chvatal and Sankoff is a special case of our setup. The proof of the large deviation result exploits the Azuma-Hoeffding lemma. The phase transition is also established for more general scoring schemes allowing general letter-to-letter alignment penalties and block deletion penalties. We give a general method for applying the bounded increments martingale method to Lipschitz functionals of Markov processes. The phase transition holds for matching Markov chains and for nonoverlapping repeats in a single sequence.

Citation

Download Citation

Richard Arratia. Michael S. Waterman. "A Phase Transition for the Score in Matching Random Sequences Allowing Deletions." Ann. Appl. Probab. 4 (1) 200 - 225, February, 1994. https://doi.org/10.1214/aoap/1177005208

Information

Published: February, 1994
First available in Project Euclid: 19 April 2007

zbMATH: 0809.62008
MathSciNet: MR1258181
Digital Object Identifier: 10.1214/aoap/1177005208

Subjects:
Primary: 62E20
Secondary: 62P10

Keywords: Azuma-Hoeffding , large deviations , Longest common subsequence , percolation , phase transition , Sequence matching

Rights: Copyright © 1994 Institute of Mathematical Statistics

Vol.4 • No. 1 • February, 1994
Back to Top