The Annals of Applied Statistics

Quantification of multiple tumor clones using gene array and sequencing data

Yichen Cheng, James Y. Dai, Thomas G. Paulson, Xiaoyu Wang, Xiaohong Li, Brian J. Reid, and Charles Kooperberg

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Cancer development is driven by genomic alterations, including copy number aberrations. The detection of copy number aberrations in tumor cells is often complicated by possible contamination of normal stromal cells in tumor samples and intratumor heterogeneity, namely the presence of multiple clones of tumor cells. In order to correctly quantify copy number aberrations, it is critical to successfully de-convolute the complex structure of the genetic information from tumor samples. In this article, we propose a general Bayesian method for estimating copy number aberrations when there are normal cells and potentially more than one tumor clones. Our method provides posterior probabilities for the proportions of tumor clones and normal cells. We incorporate prior information on the distribution of the copy numbers to prioritize biologically more plausible solutions and alleviate possible identifiability issues that have been observed by many researchers. Our model is flexible and can work for both SNP array and next-generation sequencing data. We compare our method to existing ones and illustrate the advantage of our approach in multiple datasets.

Article information

Ann. Appl. Stat., Volume 11, Number 2 (2017), 967-991.

Received: May 2016
Revised: October 2016
First available in Project Euclid: 20 July 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Copy number aberration intratumor heterogeneity identifiability BIC


Cheng, Yichen; Dai, James Y.; Paulson, Thomas G.; Wang, Xiaoyu; Li, Xiaohong; Reid, Brian J.; Kooperberg, Charles. Quantification of multiple tumor clones using gene array and sequencing data. Ann. Appl. Stat. 11 (2017), no. 2, 967--991. doi:10.1214/17-AOAS1026.

Export citation


  • Andor, N., Harness, J. V., Müller, S., Mewes, H. W. and Petritsch, C. (2014). EXPANDS: Expanding ploidy and allele frequency on nested subpopulations. Bioinformatics 30 50–60.
  • Attiyeh, E. F., Diskin, S. J., Attiyeh, M. C., Mossé, Y. P., Hou, C., Jackson, E. M., Kim, C., Glessner, J., Hakonarson, H., Biegel, J. A. and Maris, J. M. (2009). Genomic copy number determination in cancer cells from single nucleotide polymorphism microarrays based on quantitative genotyping corrected for aneuploidy. Genome Res. 19 276–283.
  • Bao, L., Pu, M. and Messer, K. (2014). AbsCN-seq: A statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data. Bioinformatics 30 1056–1063.
  • Beroukhim, R., Mermel, C. H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, J., Barretina, J., Boehm, J. S., Dobson, J., Urashima, M., Mc Henry, K. T., Pinchback, R. M., Ligon, A. H., Cho, Y. J., Haery, L., Greulich, H., Reich, M., Winckler, W., Lawrence, M. S., Weir, B. A., Tanaka, K. E., Chiang, D. Y., Bass, A. J., Loo, A. L., Hoffman, C., Prensner, J., Liefeld, T., Gao, Q., Yecies, D., Signoretti, S., Maher, E., Kaye, F. J., Sasaki, H., Tepper, J. E., Fletcher, J. A., Tabernero, J., Baselga, J., Tsao, M. S., Demichelis, F., Rubin, M. A., Janne, P. A., Daly, M. J., Nucera, C., Levine, R. L., Ebert, B. L., Gabriel, S., Rustgi, A. K., Antonescu, C. R., Ladanyi, M., Letai, A., Garraway, L. A., Loda, M., Beer, D. G., True, L. D., Okamoto, A., Pomeroy, S. L., Singer, S., Golub, T. R., Lander, E. S., Getz, G., Sellers, W. R. and Meyerson, M. (2010). The landscape of somatic copy-number alteration across human cancers. Nature 463 899–905.
  • Carter, S. L., Cibulskis, K., Helman, E., McKenna, A., Shen, H., Zack, T., Laird, P. W., Onofrio, R. C., Winckler, W., Weir, B. A., Beroukhim, R., Pellman, D., Levine, D. A., Lander, E. S., Meyerson, M. and Getz, G. (2012). Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30 413–421.
  • de Bruin, E. C., McGranahan, N., Mitter, R., Salm, M., Wedge, D. C., Yates, L., Jamal-Hanjani, M., Shafi, S., Murugaesu, N., Rowan, A. J., Gränroos, E., Muhammad, M. A., Horswell, S., Gerlinger, M., Varela, I., Jones, D., Marshall, J., Voet, T., Loo, P. V., Rassl, D. M., Rintoul, R. C., Janes, S. M., Lee, S. M., Forster, M., Ahmad, T., Lawrence, D., Falzon, M., Capitanio, A., Harkins, T. T., Lee, C. C., Tom, W., Teefe, E., Chen, S.-C., Begum, S., Rabinowitz, A., Phillimore, B., Spencer-Dene, B., Stamp, G., Szallasi, Z., Matthews, N., Stewart, A., Campbell, P. and Swanton, C. (2014). Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346 251–256.
  • Gazdar, A. F., Kurvari, V., Virmani, A., Gollahon, L., Sakaguchi, M., Westerfield, M., Kodagoda, D., Stasny, V., Cunningham, H. T., Wistuba, I. I., Tomlinson, G., Tonk, V., Ashfaq, R., Leitch, A. M., Minna, J. D. and Shay, J. W. (1998). Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78 766–774.
  • Gerlinger, M., Rowan, A. J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos, E., Martinez, P., Matthews, N., Stewart, A., Tarpey, P., Varela, I., Phillimore, B., Begum, S., McDonald, N. Q., Butler, A., Jones, D., Raine, K., Latimer, C., Santos, C. R., Nohadani, M., Eklund, A. C., Spencer-Dene, B., Clark, G., Pickering, L., Stamp, G., Gore, M., Szallasi, Z., Downward, J., Futreal, P. A. and Swanton, C. (2012). Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366 883–892.
  • Gu, J., Ajani, J. A., Hawk, E. T., Ye, Y., Lee, J. H., Bhutani, M. S., Hofstetter, W. L., Swisher, S. G., Wang, K. K. and Wu, X. (2010). Genome-wide catalogue of chromosomal aberrations in barrett’s esophagus and esophageal adenocarcinoma: A high-density single nucleotide polymorphism array analysis. Cancer Prev. Res. 3 1176–1186.
  • Larson, N. B. and Fridley, B. L. (2013). PurBayes: Estimating tumor cellularity and subclonality in next-generation sequencing data. Bioinformatics 29 1888–1889.
  • Li, X., Galipeau, P. C., Paulson, T. G., Sanchez, C. A., Arnaudo, J., Liu, K., Sather, C. L., Kostadinov, R. L., Odze, R. D., Kuhner, M. K., Maley, C. C., Self, S. G., Vaughan, T. L., Blount, P. L. and Reid, B. J. (2014). Temporal and spatial evolution of somatic chromosomal alterations: A case-cohort study of Barrett’s esophagus. Cancer Prev. Res. 7 114–127.
  • Michor, F. and Polyak, K. (2010). The origins and implications of intratumor heterogeneity. Cancer Prev. Res. 3 1361–1364.
  • Oesper, L., Mahmoody, A. and Raphael, B. J. (2013). THetA: Inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 14 R80.
  • Oesper, L., Satas, G. and Raphael, B. J. (2014). Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics 30 3532–3540.
  • Olshen, A. B., Bengtsson, H., Neuvial, P., Spellman, P. T., Olshen, R. A. and Seshan, V. E. (2011). Parent-specific copy number in paired tumor-normal studies using circular binary segmentation. Bioinformatics 27 2038–2046.
  • Reid, B. J., Li, X., Galipeau, P. C. and Vaughan, T. L. (2010). Barrett’s oesophagus and oesophageal adenocarcinoma: Time for a new synthesis. Nat. Rev. Cancer 10 87–101.
  • Staaf, J., Lindgren, D., Vallon-Christersson, J., Isaksson, A., Göransson, H., Juliusson, G., Rosenquist, R., Höglund, M., Borg, Å. and Ringnér, M. (2008). Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol. 9 R136.
  • Van Loo, P., Nordgard, S. H., Lingjærde, O. C., Russnes, H. G., Tye, I. H., Sun, W., Weigman, V. J., Marynen, P., Zetterberg, A., Naume, B., Perou, C. M., Børresen-Dale, A. and Kristensen, V. N. (2010). Allele-specific copy number analysis of tumors. Proc. Natl. Acad. Sci. USA 107 16910–16915.
  • Volm, M., Mattern, J., Sonka, J., Vogt-Schaden, M. and Wayss, K. (1985). DNA distribution in non-small-cell lung carcinomas and its relationship to clinical behavior. Cytometry 6 348–56.
  • Wang, K. K., Sampliner, R. E. and Practice Parameters Committee of the American College of Gastroenterology (2008). Updated guidelines 2008 for the diagnosis, surveillance and therapy of Barrett’s esophagus. Am. J. Gastroenterol. 103 788–797.
  • Xu, Y., Müller, P., Yuan, Y., Gulukota, K. and Ji, Y. (2015). MAD Bayes for tumor heterogeneity—feature allocation with exponential family sampling. J. Amer. Statist. Assoc. 110 503–514.
  • Yau, C. (2013). OncoSNP-SEQ: A statistical approach for the identification of somatic copy number alterations from next-generation sequencing of cancer genomes. Bioinformatics 29 2482–2484.
  • Yau, C., Mouradov, D., Jorissen, R. N., Colella, S., Mirza, G., Steers, G., Harris, A., Ragoussis, J., Sieber, O. and Holmes, C. C. (2010). A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol. 11 R92.
  • Yu, Z., Liu, Y., Shen, Y., Wang, M. and Li, A. (2014). CLImAT: Accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data. Bioinformatics 30 2576–2583.
  • Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63 22–32.
  • Zhang, J., Fujimoto, J., Zhang, J., Wedge, D. C., Song, X., Zhang, J., Seth, S., Chow, C.-W., Cao, Y., Gumbs, C., Gold, K. A., Kalhor, N., Little, L., Mahadeshwar, H., Moran, C., Protopopov, A., Sun, H., Tang, J., Wu, X., Ye, Y., William, W. N., Lee, J. J., Heymach, J. V., Hong, W. K., Swisher, S., Wistuba, I. I. and Futreal, P. A. (2014). Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346 256–259.