## Journal of Applied Probability

### Identifiability of a coalescent-based population tree model

Arindam RoyChoudhury

#### Abstract

Identifiability of evolutionary tree models has been a recent topic of discussion and some models have been shown to be nonidentifiable. A coalescent-based rooted population tree model, originally proposed by Nielsen et al. (1998), has been used by many authors in the last few years and is a simple tool to accurately model the changes in allele frequencies in the tree. However, the identifiability of this model has never been proven. Here we prove this model to be identifiable by showing that the model parameters can be expressed as functions of the probability distributions of subsamples, assuming that there are at least two (haploid) individuals sampled from each population. This a step toward proving the consistency of the maximum likelihood estimator of the population tree based on this model.

#### Article information

Source
J. Appl. Probab., Volume 51, Number 4 (2014), 921-929.

Dates
First available in Project Euclid: 20 January 2015

https://projecteuclid.org/euclid.jap/1421763318

Mathematical Reviews number (MathSciNet)
MR3301279

Zentralblatt MATH identifier
1333.92053

#### Citation

RoyChoudhury, Arindam. Identifiability of a coalescent-based population tree model. J. Appl. Probab. 51 (2014), no. 4, 921--929. https://projecteuclid.org/euclid.jap/1421763318

#### References

• Allman, E. S., Ané, C. and Rhodes, J. A. (2008). Identifiability of a Markovian model of molecular evolution with gamma-distributed rates. Adv. Appl. Prob. 40, 228–249.
• Bryant, D. et al. (2012). Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 29, 1917–1932.
• Chai, J. and Housworth, E. A. (2011). On Rogers' proof of identifiability for the GTR + $\Gamma$ + I model. Syst. Biol. 60, 713–718.
• Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376.
• Kingman, J. F. C. (1982). The coalescent. Stoch. Process. Appl. 13, 235–248.
• Liu, L., Yu, L., Pearl, D. K. and Edwards, S. V. (2009). Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58, 468–477.
• Matsen, F. A. and Steel, M. (2007). Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst. Biol. 56, 767-775.
• Nielsen, R. and Slatkin, M. (2000). Likelihood analysis of ongoing gene flow and historical association. Evolution 54, 44–50.
• Nielsen, R., Mountain, J. L., Huelsenbeck, J. P. and Slatkin, M. (1998). Maximum-likelihood estimation of population divergence times and population phylogeny in models without mutation. Evolution 52, 669–677.
• RoyChoudhury, A. (2011). Composite likelihood-based inferences on genetic data from dependent loci. J. Math. Biol. 62, 65–80.
• RoyChoudhury, A. and Thompson, E. A. (2012). Ascertainment correction for a population tree via a pruning algorithm for likelihood computation. Theoret. Pop. Biol. 82, 59–65.
• RoyChoudhury, A., Felsenstein, J. and Thompson, E. A. (2008). A two-stage pruning algorithm for likelihood computation for a population tree. Genetics 180, 1095–1105.
• Steel, M. A., Székely, L. and Hendy, M. D. (1994). Reconstructing trees when sequence sites evolve at variable rates. J. Comp. Biol. 1, 153–163.
• Takahata, N. and Nei, M. (1985). Gene genealogy and variance of interpopulational nucleotide differences. Genetics 110, 325–344.