The Annals of Applied Statistics

Phylogeny-based tumor subclone identification using a Bayesian feature allocation model

Li Zeng, Joshua L. Warren, and Hongyu Zhao

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Tumor cells acquire different genetic alterations during the course of evolution in cancer patients. As a result of competition and selection, only a few subgroups of cells with distinct genotypes survive. These subgroups of cells are often referred to as subclones. In recent years, many statistical and computational methods have been developed to identify tumor subclones, leading to biologically significant discoveries and shedding light on tumor progression, metastasis, drug resistance and other processes. However, most existing methods are either not able to infer the phylogenetic structure among subclones, or not able to incorporate copy number variations (CNV). In this article, we propose SIFA (tumor Subclone Identification by Feature Allocation), a Bayesian model which takes into account both CNV and tumor phylogeny structure to infer tumor subclones. We compare the performance of SIFA with two other commonly used methods using simulation studies with varying sequencing depth, evolutionary tree size, and tree complexity. SIFA consistently yields better results in terms of Rand Index and cellularity estimation accuracy. The usefulness of SIFA is also demonstrated through its application to whole genome sequencing (WGS) samples from four patients in a breast cancer study.

Article information

Ann. Appl. Stat., Volume 13, Number 2 (2019), 1212-1241.

Received: May 2017
Revised: August 2018
First available in Project Euclid: 17 June 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Intra-tumor heterogeneity latent feature allocation model selection tumor evolution


Zeng, Li; Warren, Joshua L.; Zhao, Hongyu. Phylogeny-based tumor subclone identification using a Bayesian feature allocation model. Ann. Appl. Stat. 13 (2019), no. 2, 1212--1241. doi:10.1214/18-AOAS1223.

Export citation


  • Aparicio, S. and Caldas, C. (2013). The implications of clonal genome evolution for cancer medicine. N. Engl. J. Med. 368 842–851.
  • Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Statist. 7 434–455.
  • Burrell, R. A., McGranahan, N., Bartek, J. and Swanton, C. (2013). The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501 338–345.
  • Campbell, P. J., Pleasance, E. D., Stephens, P. J., Dicks, E., Rance, R., Goodhead, I., Follows, G. A., Green, A. R., Futreal, P. A. and Stratton, M. R. (2008). Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl. Acad. Sci. USA 105 13081–13086.
  • Campbell, T. M., Castro, M. A., de Santiago, I., Fletcher, M. N., Halim, S., Prathalingam, R., Ponder, B. A. and Meyer, K. B. (2016). FGFR2 risk SNPs confer breast cancer risk by augmenting oestrogen responsiveness. Carcinogenesis 37 741–750.
  • Carter, S. L., Cibulskis, K., Helman, E., McKenna, A., Shen, H., Zack, T., Laird, P. W., Onofrio, R. C., Winckler, W., Weir, B. A. et al. (2012). Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30 413–421.
  • Deshwar, A. G., Vembu, S., Yung, C. K., Jang, G. H., Stein, L. and Morris, Q. (2015). PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16 35.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2014). Bayesian Data Analysis, Vol. 2. Chapman & Hall/CRC, Boca Raton, FL, USA.
  • Gerlinger, M., Rowan, A. J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos, E., Martinez, P., Matthews, N., Stewart, A., Tarpey, P. et al. (2012). Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366 883–892.
  • Gerlinger, M., Horswell, S., Larkin, J., Rowan, A. J., Salm, M. P., Varela, I., Fisher, R., McGranahan, N., Matthews, N., Santos, C. R. et al. (2014). Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat. Genet. 46 225–233.
  • Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. Available at
  • Gonzalez-Perez, A., Perez-Llamas, C., Deu-Pons, J., Tamborero, D., Schroeder, M. P., Jene-Sanz, A., Santos, A. and Lopez-Bigas, N. (2013). IntOGen-mutations identifies cancer drivers across tumor types. Nat. Methods 10 1081–1082.
  • Greaves, M. and Maley, C. C. (2012). Clonal evolution in cancer. Nature 481 306–313.
  • Jiang, Y., Qiu, Y., Minn, A. J. and Zhang, N. R. (2016). Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl. Acad. Sci. USA 113 E5528–E5537.
  • Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. and Morris, Q. (2014). Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinform. 15 35.
  • Kimura, M. (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61 893.
  • Klambauer, G., Schwarzbauer, K., Mayr, A., Clevert, D.-A., Mitterecker, A., Bodenhofer, U. and Hochreiter, S. (2012). cn.MOPS: Mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res. 40 e69.
  • Kreso, A., O’Brien, C. A., van Galen, P., Gan, O. I., Notta, F., Brown, A. M., Ng, K., Ma, J., Wienholds, E., Dunant, C. et al. (2013). Variable clonal repopulation dynamics influence chemotherapy response in colorectal cancer. Science 339 543–548.
  • Lee, J., Müller, P., Gulukota, K. and Ji, Y. (2015). A Bayesian feature allocation model for tumor heterogeneity. Ann. Appl. Stat. 9 621–639.
  • Lee, J., Müller, P., Sengupta, S., Gulukota, K. and Ji, Y. (2016). Bayesian inference for intratumour heterogeneity in mutations and copy number variation. J. R. Stat. Soc. Ser. C. Appl. Stat. 65 547–563.
  • Li, B. and Li, J. Z. (2014). A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data. Genome Biol. 15 473.
  • Maguire, S. L., Leonidou, A., Wai, P., Marchiò, C., Ng, C. K., Sapino, A., Salomon, A.-V., Reis-Filho, J. S., Weigelt, B. and Natrajan, R. C. (2015). SF3B1 mutations constitute a novel therapeutic target in breast cancer. J. Pathol. 235 571–580.
  • Marass, F., Mouliere, F., Yuan, K., Rosenfeld, N. and Markowetz, F. (2016). A phylogenetic latent feature model for clonal deconvolution. Ann. Appl. Stat. 10 2377–2404.
  • Miller, C. A., White, B. S., Dees, N. D., Griffith, M., Welch, J. S., Griffith, O. L., Vij, R., Tomasson, M. H., Graubert, T. A., Walter, M. J. et al. (2014). SciClone: Inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 10 e1003665.
  • Neal, R. M. (2003). Slice sampling. Ann. Statist. 31 705–767. With discussions and a rejoinder by the author.
  • Nilsen, G., Liestøl, K., Van Loo, P., Vollan, H. K. M., Eide, M. B., Rueda, O. M., Chin, S.-F., Russell, R., Baumbusch, L. O., Caldas, C. et al. (2012). Copynumber: Efficient algorithms for single-and multi-track copy number segmentation. BMC Genomics 13 591.
  • Nowell, P. C. (1976). The clonal evolution of tumor cell populations. Science 194 23–28.
  • Ojamies, P. N., Kontro, M., Edgren, H., Ellonen, P., Lagstrom, S., Almusa, H., Miettinen, T., Eldfors, S., Tamborero, D., Wennerberg, K. et al. (2016). Responses of AML patients to tailored drug regimens: Monitoring cancer subclones by ultra-deep resequencing. Cancer Res. 76 2378–2378.
  • Parisi, F., Ariyan, S., Narayan, D., Bacchiocchi, A., Hoyt, K., Cheng, E., Xu, F., Li, P., Halaban, R. and Kluger, Y. (2011). Detecting copy number status and uncovering subclonal markers in heterogeneous tumor biopsies. BMC Genomics 12 230.
  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66 846–850.
  • Robinson, T. J., Liu, J. C., Vizeacoumar, F., Sun, T., Maclean, N., Egan, S. E., Schimmer, A. D., Datti, A. and Zacksenhaus, E. (2013). RB1 status in triple negative breast cancer cells dictates response to radiation treatment and selective therapeutic drugs. PLoS ONE 8 e78641.
  • Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S., Bouchard-Côté, A. and Shah, S. P. (2014). PyClone: Statistical inference of clonal population structure in cancer. Nat. Methods 11 396–398.
  • Ruiz, C., Lenkiewicz, E., Evers, L., Holley, T., Robeson, A., Kiefer, J., Demeure, M. J., Hollingsworth, M. A., Shen, M., Prunkard, D. et al. (2011). Advancing a clinically relevant perspective of the clonal nature of cancer. Proc. Natl. Acad. Sci. USA 108 12054–12059.
  • Schuh, A., Becq, J., Humphray, S., Alexa, A., Burns, A., Clifford, R., Feller, S. M., Grocock, R., Henderson, S., Khrebtukova, I. et al. (2012). Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. Blood 120 4191–4196.
  • Wang, J., Cazzato, E., Ladewig, E., Frattini, V., Rosenbloom, D. I., Zairis, S., Abate, F., Liu, Z., Elliott, O., Shin, Y.-J. et al. (2016). Clonal evolution of glioblastoma under therapy. Nat. Genet. 48 768–776.
  • Watanabe, S. (2013). A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14 867–897.
  • Yachida, S., Jones, S., Bozic, I., Antal, T., Leary, R., Fu, B., Kamiyama, M., Hruban, R. H., Eshleman, J. R., Nowak, M. A. et al. (2010). Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467 1114–1117.
  • Yates, L. R., Gerstung, M., Knappskog, S., Desmedt, C., Gundem, G., Van Loo, P., Aas, T., Alexandrov, L. B., Larsimont, D., Davies, H. et al. (2015). Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21 751–759.
  • Yuan, K., Sakoparnig, T., Markowetz, F. and Beerenwinkel, N. (2015). BitPhylogeny: A probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol. 16 36.
  • Zare, H., Wang, J., Hu, A., Weber, K., Smith, J., Nickerson, D., Song, C., Witten, D., Blau, C. A. and Noble, W. S. (2014). Inferring clonal composition from multiple sections of a breast cancer. PLoS Comput. Biol. 10 e1003703.
  • Zeng, L., Warren, J. L. and Zhao, H. (2019). Supplement to “Phylogeny-based tumor subclone identification using a Bayesian feature allocation model.” DOI:10.1214/18-AOAS1223SUPP.

Supplemental materials

  • Supplement to “Phylogeny-based tumor subclone identification using a Bayesian feature allocation model”. We put additional plots and tables in the supplementary materials to assist illustration of simulation and real data results.