The Annals of Probability

Critical Phenomena in Sequence Matching

Richard Arratia and Michael S. Waterman

Full-text: Open access

Abstract

We give a generalization of the result of Erdos and Renyi on the length $R_n$ of the longest head run in the first $n$ tosses of a coin. Consider two independent sequences, $X_1 X_2\cdots X_m$ and $Y_1Y_2\cdots Y_n$. Suppose that $X_1, X_2,\cdots$ are i.i.d. $\mu$, and $Y_1, Y_2,\cdots$ are i.i.d. $\nu$, where $\mu$ and $\nu$ are possibly different distributions on a common finite alphabet $S$. Let $p \equiv P(X_1 = Y_1) \in (0, 1)$. The length of the longest matching consecutive subsequence is $M_{m,n} \equiv \max \{k: X_{i+r} = Y_{j+r}$ for $r = 1$ to $k$, for some $0 \leq i \leq m - k, 0 \leq j \leq n - k\}$. For $m$ and $n \rightarrow \infty$ with $\log(m)/\log(mn) \rightarrow \lambda \in (0,1)$, our result is that there is a constant $K \equiv K(\mu, \nu, \lambda) \in (0, 1\rbrack$ such that $P(\lim M_{m,n}/\log_{1/p}(mn) = K) = 1$. The proof uses large deviation methods. The constant $K$ is determined from a variational formula involving the Kullback-Liebler distance or relative entropy. A simple necessary and sufficient condition for $K = 1$ is given. For the case $m = n (\lambda = 1/2)$ and $\mu = \nu, K = 1$. The set of $(\mu, \nu, \lambda)$ for which $K = 1$ has nonempty interior. The boundary of this set is the location of a phase transition. The results generalize to more than two sequences and to Markov chains. A strong law of large numbers is given for the proportion of letters within the longest matching word; the limiting proportion exhibits critical behavior, similar to that of $K$.

Article information

Source
Ann. Probab., Volume 13, Number 4 (1985), 1236-1249.

Dates
First available in Project Euclid: 19 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aop/1176992808

Digital Object Identifier
doi:10.1214/aop/1176992808

Mathematical Reviews number (MathSciNet)
MR806221

Zentralblatt MATH identifier
0576.60058

JSTOR
links.jstor.org

Subjects
Primary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces)
Secondary: 68G10 94A17: Measures of information, entropy

Keywords
Entropy Kullback-Liebler distance large deviations sequence matching

Citation

Arratia, Richard; Waterman, Michael S. Critical Phenomena in Sequence Matching. Ann. Probab. 13 (1985), no. 4, 1236--1249. doi:10.1214/aop/1176992808. https://projecteuclid.org/euclid.aop/1176992808


Export citation