The Annals of Applied Probability

The ages of mutations in gene trees

R. C. Griffiths and Simon Tavaré

Full-text: Open access

Abstract

Under the infinitely many sites mutation model, the mutational history of a sample of DNA sequences can be described by a unique gene tree. We show how to find the conditional distribution of the ages of the mutations and the time to the most recent common ancestor of the sample, given this gene tree. Explicit expressions for such distributions seem impossible to find for the sample sizes of interest in practice. We resort to a Monte Carlo method to approximate these distributions. We use this method to study the effects of variable population size and variable mutation rates, the distribution of the time to the most recent common ancestor of the population and the distribution of other functionals of the underlying coalescent process, conditional on the sample gene tree.

Article information

Source
Ann. Appl. Probab., Volume 9, Number 3 (1999), 567-590.

Dates
First available in Project Euclid: 21 August 2002

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1029962804

Digital Object Identifier
doi:10.1214/aoap/1029962804

Mathematical Reviews number (MathSciNet)
MR1722273

Zentralblatt MATH identifier
0948.92016

Subjects
Primary: 60J70: Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.) [See also 92Dxx] 62M05: Markov processes: estimation 65U05 92D10: Genetics {For genetic algebras, see 17D92} 92D20: Protein sequences, DNA sequences

Keywords
Ages of mutations ancestral inference coalescent process gene trees population genetics samples of DNA

Citation

Griffiths, R. C.; Tavaré, Simon. The ages of mutations in gene trees. Ann. Appl. Probab. 9 (1999), no. 3, 567--590. doi:10.1214/aoap/1029962804. https://projecteuclid.org/euclid.aoap/1029962804


Export citation

References

  • Bahlo, M. and Griffiths, R. C. (2000). Gene trees in subdivided populations. Theoret. Population Biol. To appear.
  • Donnelly, P. and Tavar´e, S. (1995). Coalescents and genealogical structure under neutrality. Ann. Rev. Genet. 29 401-421.
  • Ethier, S. and Griffiths, R. C. (1987). The infinitely-many-sites-model as a measure valued diffusion. Ann. Probab. 15 515-545.
  • Ethier, S. and Shiga, T. (1994). Neutral allelic genealogy. In Measure-valued Processes, Stochastic PDEs, and Interacting Sy stems 87-97. Amer. Math. Soc., Providence, RI.
  • Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population Biol. 3 87-112.
  • Forsy the, G. E. and Leibler, R. A. (1950). Matrix inversion by the Monte Carlo method. Math. Comp. 26 127-129.
  • Fu, Y.-X. and Li, W.-H. (1997). Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14 195-199.
  • Fullerton, S. M., Harding, R. M., Boy ce, A. J. and Clegg, J. B. (1994). Molecular and population genetic analysis of allelic sequence diversity at the human -globin locus. Proc. Nat. Acad. Sci. U.S.A. 91 1805-1809.
  • Griffiths, R. C. (1989). Genealogical-tree probabilities in the infinitely-many-site model. J. Math. Biol. 27 667-680.
  • Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comp. Biol. 3 479-502. Griffiths, R. C. and Tavar´e, S. (1994a). Simulating probability distributions in the coalescent. Theoret. Population Biol. 46 131-159. Griffiths, R. C. and Tavar´e, S. (1994b). Ancestral inference in population genetics. Statist. Sci. 9 307-319. Griffiths, R. C. and Tavar´e, S. (1994c). Sampling theory for neutral alleles in a varying environment. Proc. Roy. Soc. London Ser. B 344 403-410.
  • Griffiths, R. C. and Tavar´e, S. (1995). Unrooted genealogical tree probabilities in the infinitely many-sites model. Math. Biosci. 127 77-98.
  • Griffiths, R. C. and Tavar´e, S. (1998). The age of a mutation in a general coalescent tree. Stoch. Models 14 273-295.
  • Gusfield, D. (1991). Efficient algorithms for inferring evolutionary trees. Networks 21 19-28.
  • Halton, J. H. (1970). A retrospective and prospective study of the Monte Carlo method. SIAM Rev. 12 1-63. Harding, R. M., Fullerton, S. M., Griffiths, R. C., Bond, J., Cox, M. J., Schneider, J. A.,
  • Moulin, D. and Clegg, J. B. (1997). Archaic African and Asian lineages in the genetic ancestry of modern humans. Amer. J. Hum. Genet. 60 772-789.
  • Harding, R. M., Fullerton, S. M., Griffiths, R. C. and Clegg, J. B. (1997). A gene tree for -globin sequences from Melanesia. J. Mol. Evol. 44 S133-S138.
  • Hudson, R. R. (1983). Properties of a neutral allele model with intragenic recombination. Theoret. Population Biol. 23 183-201.
  • Hudson, R. R. (1991). Gene genealogies and the coalescent process. In Oxford Survey s in Evolutionary Biology (D. Futuy ma and J. Antonovics, eds.) 7 1-44. Oxford Univ. Press.
  • Kimura, M. and Ohta, T. (1973). The age of a neutral mutant persisting in a finite population. Genetics 75 199-212. Kingman, J. F. C. (1982a). On the genealogy of large populations. J. Appl. Probab. 19A 27-43. Kingman, J. F. C. (1982b). The coalescent. Stochastic Process. Appl. 13 235-248. Kingman, J. F. C. (1982c). Exchangeability and the evolution of large populations. In Exchangeability in Probability and Statistics (G. Koch and F. Spizzichino, eds.) 97-112. NorthHolland, Amsterdam.
  • Kuhner, M. K., Yamato, J. and Felsenstein, J. (1995). Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140 1421-1430.
  • Saunders, I. W., Tavar´e, S. and Watterson, G. A. (1984). On the genealogy of nested subsamples from a haploid population. Adv. in Appl. Probab. 16 471-491.
  • Slatkin, M. and Hudson, R. R. (1991). Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129 555-562.
  • Tajima, F. (1983). Evolutionary relationships of DNA sequences in finite populations. Genetics 105 437-460.
  • Tavar´e, S., Balding, D., Griffiths R. C. and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics 145 505-518.
  • Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theoret. Population Biol. 7 256-276.
  • Watterson, G. A. (1996). Motoo Kimura's use of diffusion theory in population genetics. Theoret. Population Biol. 49 154-188.
  • Whitfield, L. S., Sulston, J. E. and Goodfellow, P. N. (1995). Sequence variation of the human Y chromosome. Nature 378 379-380.