The Annals of Applied Statistics

Phylogeny-based tumor subclone identification using a Bayesian feature allocation model

Li Zeng, Joshua L. Warren, and Hongyu Zhao

Tumor cells acquire different genetic alterations during the course of evolution in cancer patients. As a result of competition and selection, only a few subgroups of cells with distinct genotypes survive. These subgroups of cells are often referred to as subclones. In recent years, many statistical and computational methods have been developed to identify tumor subclones, leading to biologically significant discoveries and shedding light on tumor progression, metastasis, drug resistance and other processes. However, most existing methods are either not able to infer the phylogenetic structure among subclones, or not able to incorporate copy number variations (CNV). In this article, we propose SIFA (tumor Subclone Identification by Feature Allocation), a Bayesian model which takes into account both CNV and tumor phylogeny structure to infer tumor subclones. We compare the performance of SIFA with two other commonly used methods using simulation studies with varying sequencing depth, evolutionary tree size, and tree complexity. SIFA consistently yields better results in terms of Rand Index and cellularity estimation accuracy. The usefulness of SIFA is also demonstrated through its application to whole genome sequencing (WGS) samples from four patients in a breast cancer study.

Ann. Appl. Stat., Volume 13, Number 2 (2019), 1212-1241.

Received: May 2017
Revised: August 2018
Intra-tumor heterogeneity latent feature allocation model selection tumor evolution


Zeng, Li; Warren, Joshua L.; Zhao, Hongyu. Phylogeny-based tumor subclone identification using a Bayesian feature allocation model. Ann. Appl. Stat. 13 (2019), no. 2, 1212--1241. doi:10.1214/18-AOAS1223.

Supplemental materials

  • Supplement to “Phylogeny-based tumor subclone identification using a Bayesian feature allocation model”. We put additional plots and tables in the supplementary materials to assist illustration of simulation and real data results.