The Annals of Applied Probability

Population genetics of neutral mutations in exponentially growing cancer cell populations

Rick Durrett

Full-text: Open access


In order to analyze data from cancer genome sequencing projects, we need to be able to distinguish causative, or “driver,” mutations from “passenger” mutations that have no selective effect. Toward this end, we prove results concerning the frequency of neutural mutations in exponentially growing multitype branching processes that have been widely used in cancer modeling. Our results yield a simple new population genetics result for the site frequency spectrum of a sample from an exponentially growing population.

Article information

Ann. Appl. Probab., Volume 23, Number 1 (2013), 230-250.

First available in Project Euclid: 25 January 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60J85: Applications of branching processes [See also 92Dxx] 92D10: Genetics {For genetic algebras, see 17D92}

Exponentially growing population site frequency spectrum multitype branching process cancer model


Durrett, Rick. Population genetics of neutral mutations in exponentially growing cancer cell populations. Ann. Appl. Probab. 23 (2013), no. 1, 230--250. doi:10.1214/11-AAP824.

Export citation


  • Bozic, I., Antal, T., Ohtsuki, H., Carter, H., Kim, D. et al. (2010). Accumulation of driver and passenger mutations during tumor progression. Proc. Natl. Acad. Sci. USA 107 18545–18550.
  • The Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455 1061–1068.
  • Darling, D. A. (1952). The role of the maximum term in the sum of independent random variables. Trans. Amer. Math. Soc. 73 95–107.
  • Durrett, R. (2008). Probability Models for DNA Sequence Evolution, 2nd ed. Springer, New York.
  • Durrett, R. and Moseley, S. (2010). Evolution of resistance and progression to disease during clonal expansion of cancer. Theor. Pop. Biol. 77 42–48.
  • Durrett, R. and Schweinsberg, J. (2004). Approximating selective sweeps. Theor. Pop. Biol. 66 129–138.
  • Durrett, R. and Schweinsberg, J. (2005). Power laws for family sizes in a gene duplication model. Ann. Probab. 33 2094–2126.
  • Durrett, R., Foo, J., Ledeer, K., Mayberry, J. and Michor, F. (2011). Intratumor heterogeneity in evolutionary models of tumor progression. Genetics 188 461–477.
  • Fuchs, A., Joffe, A. and Teugels, J. (2001). Expectation of the ratio of the sum of squares to the square of the sum: exact and asymptotic results. Theory Probab. Appl. 46 243–255.
  • Griffiths, R. C. and Pakes, A. G. (1988). An infinite-alleles version of the simple branching process. Adv. in Appl. Probab. 20 489–524.
  • Griffiths, R. C. and Tavaré, S. (1998). The age of mutation in the general coalescent tree. Stoch. Models 14 273–295.
  • Haeno, H., Iwasa, Y. and Michor, F. (2007). The evolution of two mutations during clonal expansion. Genetics 177 2209–2221.
  • Iwasa, Y., Nowak, M. A. and Michor, F. (2006). Evolution of resistance during clonal expansion. Genetics 172 2557–2566.
  • Jones, S. et al. (2008). Core signalling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321 1801–1812.
  • Jones, S. et al. (2010). Frequent mutations of chromatic remodeling gene ARID1A in ovarian cell carcinoma. Science 330 228–231.
  • Kingman, J. F. C. (1982). Exchangeability and the evolution of large populations. In Exchangeability in Probability and Statistics (G. Koch and F. Spizzechio, eds.) 97–112. North-Holland, Amsterdam.
  • Logan, B. F., Mallows, C. L., Rice, S. O. and Shepp, L. A. (1973). Limit distributions of self-normalized sums. Ann. Probab. 1 788–809.
  • Luebeck, E. G. and Mollgavkar, S. H. (2002). Multistage carcinogenesis and the incidence of colorectal cancer. Proc. Natl. Acad. Sci. USA 99 15095–15100.
  • O’Connell, N. (1993). Yule approximation for the skeleton of a branching process. J. Appl. Probab. 30 725–729.
  • Parmigiani, G. et al. (2007). Statistical methods for the analysis of cancer genome seqeuncing data. Available at
  • Parsons, D. W. et al. (2008). An integrated genomic analysis of human glioblastome multiforme. Science 321 1807–1812.
  • Pitman, J. (2006). Combinatorial Stochastic Processes. Springer, New York.
  • Pitman, J. and Yor, M. (1997). The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 855–900.
  • Polanski, A., Bobrowski, A. and Kimmel, M. (2003). A note on distributions of times to coalescence, under time-dependent population size. Theor. Pop. Biol. 63 33–40.
  • Sjöblom, T. et al. (2006). The consensus coding sequences of human breast and colorectal cancers. Science 314 268–274.
  • Slatkin, M. and Hudson, R. R. (1991). Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129 555–562.
  • Wood, L. D. et al. (2007). Tyhe genomic landscapes of human breast and colorectal cancers. Science 318 1108–1113.