Bernoulli

  • Bernoulli
  • Volume 24, Number 2 (2018), 971-992.

An upper bound on the convergence rate of a second functional in optimal sequence alignment

Raphael Hauser, Heinrich Matzinger, and Ionel Popescu

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Consider finite sequences $X_{[1,n]}=X_{1},\ldots,X_{n}$ and $Y_{[1,n]}=Y_{1},\ldots,Y_{n}$ of length $n$, consisting of i.i.d. samples of random letters from a finite alphabet, and let $S$ and $T$ be chosen i.i.d. randomly from the unit ball in the space of symmetric scoring functions over this alphabet augmented by a gap symbol. We prove a probabilistic upper bound of linear order in $(\ln(n))^{1/4}n^{3/4}$ for the deviation of the score relative to $T$ of optimal alignments with gaps of $X_{[1,n]}$ and $Y_{[1,n]}$ relative to $S$. It remains an open problem to prove a lower bound. Our result contributes to the understanding of the microstructure of optimal alignments relative to one given scoring function, extending a theory begun in (J. Stat. Phys. 153 (2013) 512–529).

Article information

Source
Bernoulli, Volume 24, Number 2 (2018), 971-992.

Dates
Received: September 2014
Revised: November 2015
First available in Project Euclid: 21 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.bj/1505980885

Digital Object Identifier
doi:10.3150/16-BEJ823

Mathematical Reviews number (MathSciNet)
MR3706783

Zentralblatt MATH identifier
06778354

Keywords
convex geometry large deviations percolation theory sequence alignment

Citation

Hauser, Raphael; Matzinger, Heinrich; Popescu, Ionel. An upper bound on the convergence rate of a second functional in optimal sequence alignment. Bernoulli 24 (2018), no. 2, 971--992. doi:10.3150/16-BEJ823. https://projecteuclid.org/euclid.bj/1505980885


Export citation

References

  • [1] Azuma, K. (1967). Weighted sums of certain dependent random variables. Tôhoku Math. J. (2) 19 357–367.
  • [2] Chvatal, V. and Sankoff, D. (1975). Longest common subsequences of two random sequences. J. Appl. Probab. 12 306–315.
  • [3] Evans, L.C. and Gariepy, R.F. (1992). Measure Theory and Fine Properties of Functions. Studies in Advanced Mathematics. Boca Raton, FL: CRC Press.
  • [4] Hauser, R. and Matzinger, H. (2013). Letter change bias and local uniqueness in optimal sequence alignments. J. Stat. Phys. 153 512–529.
  • [5] Kesten, H., ed. (2004). Probability on Discrete Structures. Encyclopaedia of Mathematical Sciences 110. Berlin: Springer.
  • [6] Lember, J. and Matzinger, H. (2012). Detecting the homology of DNA-sequences based on the variety of optimal alignments: A case study. Available at arXiv:1210.3771 [stat.AP].
  • [7] Needleman, S.B. and Wunsch, C.D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48 443–453.
  • [8] Waterman, M.S. and Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Statist. Sci. 9 367–381.