The Annals of Applied Probability

$r$-scan statistics of a marker array in multiple sequences derived from a common progenitor

Chingfer Chen and Samuel Karlin

Full-text: Open access

Abstract

This study is motivated by problems of molecular sequence comparisons for biological traits conserved or lost over evolution time.A marker of interest is distributed in the genome of the ancestor and inherited among $l$ offspring species which descend from this common ancestor. Each marker will be retained or lost during the evolution of the descendent species. The objective of the analysis here is to ascertain probabilities of clustering or overdispersion of the marker array among the sequences of the descendent species. Limiting distributions for the extremal $r$-scan statistics (defined in text) of the trait distributed among the $l$ dependent offspring processes are derived by adapting the Chen–Stein Poisson approximation method. Results that accommodate new occurrences of the trait (gene) arising from duplications and transposition occurrences are also described.The $r$-scan statistical analysis is further applied to a multi sequence combined Poisson model where ${B_1,\dots, B_l}$ are generated from $m$ independent Poisson processes ${A_1,\dots, A_m}$ such that $B_k = \bigcup_{i\epsilonZ_k}A_i$, where ${Z_k}_1\leqk\leql$ are subsets of ${1, 2,\dots,m}$.

Article information

Source
Ann. Appl. Probab., Volume 10, Number 3 (2000), 709-725.

Dates
First available in Project Euclid: 22 April 2002

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1019487507

Digital Object Identifier
doi:10.1214/aoap/1019487507

Mathematical Reviews number (MathSciNet)
MR1789977

Zentralblatt MATH identifier
1084.92506

Subjects
Primary: 60E05: Distributions: general theory
Secondary: 60G50: Sums of independent random variables; random walks

Keywords
r-scan statistics Chen-Stein Poisson approximation Poisson processes total variation distance asymptotic distributions

Citation

Karlin, Samuel; Chen, Chingfer. $r$-scan statistics of a marker array in multiple sequences derived from a common progenitor. Ann. Appl. Probab. 10 (2000), no. 3, 709--725. doi:10.1214/aoap/1019487507. https://projecteuclid.org/euclid.aoap/1019487507


Export citation

References

  • Arratia,R.,Goldstein,L. and Gordon,L. (1989). Two moments suffice for Poisson approximations: the Chen-Stein method. Ann. Probab. 17 9-25.
  • Barbour,A. D.,Holst,L. and Janson,S. (1992). Poisson Approximation. Oxford Scientific Publications.
  • Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Probab. 3 534-545.
  • Dembo,A. and Karlin,S. (1992). Poisson approximations for r-scan processes. Ann. Appl. Probab. 2 329-357.
  • Feller,W. (1966). An Introduction to Probability Theory and Its Applications. Wiley, New York.
  • Gerstein,M. (1997). A structure census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J. Molecular Biology 274 562-576.
  • Karlin,S. and Brendel,V. (1992). Chance and statistical significance in protein and DNA sequence analysis. Science 257 39-49.
  • Karlin,S. and Cardon,L. R. (1994). Computational DNA sequence analysis. Ann. Rev. Microbiology 48 619-654.
  • Karlin,S. and Macken,C. (1991). Some statistical problems in the assessment of inhomogeneities of DNA sequence data. J. Amer. Statist. Assoc. 86 26-33.
  • Karlin,S.,Mr´azek,J. and Campbell,A. (1996). Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Research 24 4263-4272.
  • Karlin,S. and Taylor,H. M. (1981). A Second Course in Stochastic Processes, 2nd ed. Academic Press, New York.
  • Masse,M. J. O., Karlin,S., Schachtel,A. and Mocarski,E. S. (1992). Human cytomegalovirus origin of DNA replication (oriLyt) resides within a highly complex repetitive region. Proc. Nat. Acad. Sci. U.S.A. 89 5246-5250.
  • Naus,J. I. (1979). An indexed bibliography of clusters clumps and coincidences. Internat. Statist. Rev. 47 47-78.
  • Naus,J. I. (1982). Approximation of distributions of scan statistics. J. Amer. Statist. Assoc. 77 177-183.
  • Reinert,G. and Schbath,S. (1998). Compound Poisson and Poisson approximations for occurrences of multiple words in Markov chains. J. Comput. Biology 5 223-253.
  • Stein,C. (1986). Approximation Computation of Expectations. IMS, Hayward, CA.