## Bernoulli

• Bernoulli
• Volume 24, Number 2 (2018), 971-992.

### An upper bound on the convergence rate of a second functional in optimal sequence alignment

#### Abstract

Consider finite sequences $X_{[1,n]}=X_{1},\ldots,X_{n}$ and $Y_{[1,n]}=Y_{1},\ldots,Y_{n}$ of length $n$, consisting of i.i.d. samples of random letters from a finite alphabet, and let $S$ and $T$ be chosen i.i.d. randomly from the unit ball in the space of symmetric scoring functions over this alphabet augmented by a gap symbol. We prove a probabilistic upper bound of linear order in $(\ln(n))^{1/4}n^{3/4}$ for the deviation of the score relative to $T$ of optimal alignments with gaps of $X_{[1,n]}$ and $Y_{[1,n]}$ relative to $S$. It remains an open problem to prove a lower bound. Our result contributes to the understanding of the microstructure of optimal alignments relative to one given scoring function, extending a theory begun in (J. Stat. Phys. 153 (2013) 512–529).

#### Article information

Source
Bernoulli, Volume 24, Number 2 (2018), 971-992.

Dates
Revised: November 2015
First available in Project Euclid: 21 September 2017

https://projecteuclid.org/euclid.bj/1505980885

Digital Object Identifier
doi:10.3150/16-BEJ823

Mathematical Reviews number (MathSciNet)
MR3706783

Zentralblatt MATH identifier
06778354

#### Citation

Hauser, Raphael; Matzinger, Heinrich; Popescu, Ionel. An upper bound on the convergence rate of a second functional in optimal sequence alignment. Bernoulli 24 (2018), no. 2, 971--992. doi:10.3150/16-BEJ823. https://projecteuclid.org/euclid.bj/1505980885

#### References

• [1] Azuma, K. (1967). Weighted sums of certain dependent random variables. Tôhoku Math. J. (2) 19 357–367.
• [2] Chvatal, V. and Sankoff, D. (1975). Longest common subsequences of two random sequences. J. Appl. Probab. 12 306–315.
• [3] Evans, L.C. and Gariepy, R.F. (1992). Measure Theory and Fine Properties of Functions. Studies in Advanced Mathematics. Boca Raton, FL: CRC Press.
• [4] Hauser, R. and Matzinger, H. (2013). Letter change bias and local uniqueness in optimal sequence alignments. J. Stat. Phys. 153 512–529.
• [5] Kesten, H., ed. (2004). Probability on Discrete Structures. Encyclopaedia of Mathematical Sciences 110. Berlin: Springer.
• [6] Lember, J. and Matzinger, H. (2012). Detecting the homology of DNA-sequences based on the variety of optimal alignments: A case study. Available at arXiv:1210.3771 [stat.AP].
• [7] Needleman, S.B. and Wunsch, C.D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48 443–453.
• [8] Waterman, M.S. and Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Statist. Sci. 9 367–381.