The Annals of Applied Statistics

Hierarchical Bayesian analysis of somatic mutation data in cancer

Jie Ding, Lorenzo Trippa, Xiaogang Zhong, and Giovanni Parmigiani

Full-text: Open access


Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data and to shed light on the overall proportion of drivers among sequenced genes. Our methodology applies to different experimental designs used in practice, including one-stage, two-stage and candidate gene designs. Also, sample sizes are typically small relative to the rarity of individual mutations. Via a shrinkage method borrowing strength from the whole genome in assessing individual genes, we reinforce inference and address the selection effects induced by multistage designs. Our simulation studies show that the posterior driver probabilities provide a nearly unbiased false discovery rate estimate. We apply our methods to pancreatic and breast cancer data, contrast our results to previous estimates and provide estimated proportions of drivers for these two types of cancer.

Article information

Ann. Appl. Stat., Volume 7, Number 2 (2013), 883-903.

First available in Project Euclid: 27 June 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Somatic mutations drivers and passengers hierarchical Bayesian model pancreatic and breast cancer


Ding, Jie; Trippa, Lorenzo; Zhong, Xiaogang; Parmigiani, Giovanni. Hierarchical Bayesian analysis of somatic mutation data in cancer. Ann. Appl. Stat. 7 (2013), no. 2, 883--903. doi:10.1214/12-AOAS604.

Export citation


  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289–300.
  • Boca, S. M., Kinzler, K. W., Velculescu, V. E., Vogelstein, B. and Parmigiani, G. (2010). Patient-oriented gene set analysis for cancer mutation data. Genome Biol. 11 R112.
  • Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455 1061–1068.
  • Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474 609–615.
  • Ciriello, G., Cerami, E., Sander, C. and Schultz, N. (2012). Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22 398–406.
  • Ding, J., Trippa, L., Zhong, X. and Parmigiani, G. (2013). Supplement to “Hierarchical Bayesian analysis of somatic mutation data in cancer.” DOI:10.1214/12-AOAS604SUPP.
  • Dunson, D. B. (2010). Nonparametric Bayes applications to biostatistics. In Bayesian Nonparametrics (N. L. Hjort, C. Holmes, P. Müller and S. G. Walker, eds.) 223–273. Cambridge Univ. Press, Cambridge.
  • Efron, B. and Morris, C. (1973). Combining possibly related estimation problems (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 35 379–421.
  • Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70–86.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • Getz, G., Höfling, H., Mesirov, J. P., Golub, T. R., Meyerson, M. L., Tibshirani, R. and Lander, E. S. (2007). Comment on “The consensus coding sequences of human breast and colorectal cancers.” Science 317 1500b.
  • Greenman, C., Wooster, R., Futreal, P. A., Stratton, M. R. and Easton, D. F. (2006). Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173 2187–2198.
  • Greenman, C., Stephens, P., Smith, R., Dalgliesh, G. L., Hunter, C., Bignell, G., Davies, H., Teague, J., Butler, A., Stevens, C., Edkins, S., O’Meara, S., Vastrik, I., Schmidt, E. E., Avis, T., Barthorpe, S., Bhamra, G., Buck, G., Choudhury, B., Clements, J., Cole, J., Dicks, E., Forbes, S., Gray, K., Halliday, K., Harrison, R., Hills, K., Hinton, J., Jenkinson, A., Jones, D., Menzies, A., Mironenko, T., Perry, J., Raine, K., Richardson, D., Shepherd, R., Small, A., Tofts, C., Varian, J., Webb, T., West, S., Widaa, S., Yates, A., Cahill, D. P., Louis, D. N., Goldstraw, P., Nicholson, A. G., Brasseur, F., Looijenga, L., Weber, B. L., Chiew, Y.-E., DeFazio, A., Greaves, M. F., Green, A. R., Campbell, P., Birney, E., Easton, D. F., Chenevix-Trench, G., Tan, M.-H., Khoo, S. K., Teh, B. T., Yuen, S. T., Leung, S. Y., Wooster, R., Futreal, P. A. and Stratton, M. R. (2007). Patterns of somatic mutation in human cancer genomes. Nature 446 153–158.
  • Jones, S., Zhang, X., Parsons, D. W., Lin, J. C., Leary, R. J., Angenendt, P., Mankoo, P., Carter, H., Kamiyama, H., Jimeno, A., Hong, S., Fu, B., Lin, M., Calhoun, E. S., Kamiyama, M., Walter, K., Nikolskaya, T., Nikolsky, Y., Hartigan, J., Smith, D. R., Hidalgo, M., Leach, S. D., Klein, A. P., Jaffee, E. M., Goggins, M., Maitra, A., Iacobuzio-Donahue, C., Eshleman, J. R., Kern, S. E., Hruban, R. H., Karchin, R., Papadopoulos, N., Parmigiani, G., Vogelstein, B., Velculescu, V. E. and Kinzler, K. W. (2008). Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321 1801–1806.
  • Kan, Z., Jaiswal, B. S., Stinson, J., Janakiraman, V., Bhatt, D., Stern, H. M., Yue, P., Haverty, P. M., Bourgon, R., Zheng, J., Moorhead, M., Chaudhuri, S., Tomsho, L. P., Peters, B. A., Pujara, K., Cordes, S., Davis, D. P., Carlton, V. E. H., Yuan, W., Li, L., Wang, W., Eigenbrot, C., Kaminker, J. S., Eberhard, D. A., Waring, P., Schuster, S. C., Modrusan, Z., Zhang, Z., Stokoe, D., de Sauvage, F. J., Faham, M. and Seshagiri, S. (2010). Diverse somatic mutation patterns and pathway alterations in human cancers. Nature 466 869–873.
  • Kraft, P. (2006). Efficient two-stage genome-wide association designs based on false positive report probabilities. Pac. Symp. Biocomput. 523–534.
  • Parmigiani, G., Boca, S., Lin, J., Kinzler, K. W., Velculescu, V. and Vogelstein, B. (2009). Design and analysis issues in genome-wide somatic mutation studies of cancer. Genomics 93 17–21.
  • Parsons, D. W., Jones, S., Zhang, X., Lin, J. C., Leary, R. J., Angenendt, P., Mankoo, P., Carter, H., Siu, I., Gallia, G. L., Olivi, A., McLendon, R., Rasheed, B. A., Keir, S., Nikolskaya, T., Nikolsky, Y., Busam, D. A., Tekleab, H., Diaz, L. A., Hartigan, J., Smith, D. R., Strausberg, R. L., Marie, S. K. N., Shinjo, S. M. O., Yan, H., Riggins, G. J., Bigner, D. D., Karchin, R., Papadopoulos, N., Parmigiani, G., Vogelstein, B., Velculescu, V. E. and Kinzler, K. W. (2008). An integrated genomic analysis of human glioblastoma multiforme. Science 312 1807–1812.
  • Prendergast, J. G. D., Campbell, H., Gilbert, N., Dunlop, M. G., Bickmore, W. A. and Semple, C. A. M. (2007). Chromatin structure and evolution in the human genome. BMC Evol. Biol. 7 72.
  • Schuster-Böckler, B. and Lehner, B. (2012). Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488 504–507.
  • Sjöblom, T., Jones, S., Wood, L. D., Parsons, D. W., Lin, J., Barber, T. D., Mandelker, D., Leary, R. J., Ptak, J., Silliman, N., Szabo, S., Buckhaults, P., Farrell, C., Meeh, P., Markowitz, S. D., Willis, J., Dawson, D., Willson, J. K. V., Gazdar, A. F., Hartigan, J., Wu, L., Liu, C., Parmigiani, G., Park, B. H., Bachman, K. E., Papadopoulos, N., Vogelstein, B., Kinzler, K. W. and Velculescu, V. E. (2006). The consensus coding sequences of human breast and colorectal cancers. Science 314 268–274.
  • Skol, A. D., Scott, L. J., Abecasis, G. R. and Boehnke, M. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38 209–213.
  • Stamatoyannopoulos, J. A., Adzhubei, I., Thurman, R. E., Kryukov, G. V., Mirkin, S. M. and Sunyaev, S. R. (2009). Human mutation rate associated with DNA replication timing. Nat. Genet. 41 393–395.
  • Trippa, L. and Parmigiani, G. (2011). False discovery rates in somatic mutation studies of cancer. Ann. Appl. Stat. 5 1360–1378.
  • Walter, M. J., Shen, D., Ding, L., Shao, J., Koboldt, D. C., Chen, K., Larson, D. E., McLellan, M. D., Dooling, D., Abbott, R., Fulton, R., Magrini, V., Schmidt, H., Kalicki-Veizer, J., O’Laughlin, M., Fan, X., Grillot, M., Witowski, S., Heath, S., Frater, J. L., Eades, W., Tomasson, M., Westervelt, P., DiPersio, J. F., Link, D. C., Mardis, E. R., Ley, T. J., Wilson, R. K. and Graubert, T. A. (2012). Clonal architecture of secondary acute myeloid leukemia. The New England Journal of Medicine 366 1090–1098.
  • Wang, H. and Stram, D. O. (2006). Optimal two-stage genome-wide association designs based on false discovery rate. Comput. Statist. Data Anal. 51 457–465.
  • Wolfe, K. H., Sharp, P. M. and Li, W. H. (1989). Mutation rates differ among regions of the mammalian genome. Nature 337 283–285.
  • Wood, L. D., Parsons, D. W., Jones, S., Lin, J., Sjöblom, T., Leary, R. J., Shen, D., Boca, S. M., Barber, T., Ptak, J., Silliman, N., Szabo, S., Dezso, Z., Ustyanksky, V., Nikolskaya, T., Nikolsky, Y., Karchin, R., Wilson, P. A., Kaminker, J. S., Zhang, Z., Croshaw, R., Willis, J., Dawson, D., Shipitsin, M., Willson, J. K. V., Sukumar, S., Polyak, K., Park, B. H., Pethiyagoda, C. L., Pant, P. V. K., Ballinger, D. G., Sparks, A. B., Hartigan, J., Smith, D. R., Suh, E., Papadopoulos, N., Buckhaults, P., Markowitz, S. D., Parmigiani, G., Kinzler, K. W., Velculescu, V. E. and Vogelstein, B. (2007). The genomic landscapes of human breast and colorectal cancers. Science 318 1108–1113.

Supplemental materials