The Annals of Applied Statistics

A Bayesian graphical model for genome-wide association studies (GWAS)

Laurent Briollais, Adrian Dobra, Jinnan Liu, Matt Friedlander, Hilmi Ozcelik, and Hélène Massam

Full-text: Open access


The analysis of GWAS data has long been restricted to simple models that cannot fully capture the genetic architecture of complex human diseases. As a shift from standard approaches, we propose here a general statistical framework for multi-SNP analysis of GWAS data based on a Bayesian graphical model. Our goal is to develop a general approach applicable to a wide range of genetic association problems, including GWAS and fine-mapping studies, and, more specifically, be able to: (1) Assess the joint effect of multiple SNPs that can be linked or unlinked and interact or not; (2) Explore the multi-SNP model space efficiently using the Mode Oriented Stochastic Search (MOSS) algorithm and determine the best models. We illustrate our new methodology with an application to the CGEM breast cancer GWAS data. Our algorithm selected several SNPs embedded in multi-locus models with high posterior probabilities. Most of the SNPs selected have a biological relevance. Interestingly, several of them have never been detected in standard single-SNP analyses. Finally, our approach has been implemented in the open source $R$ package genMOSS.

Article information

Ann. Appl. Stat., Volume 10, Number 2 (2016), 786-811.

Received: March 2013
Revised: September 2015
First available in Project Euclid: 22 July 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Graphical model Bayesian stochastic search GWAS SNP breast cancer


Briollais, Laurent; Dobra, Adrian; Liu, Jinnan; Friedlander, Matt; Ozcelik, Hilmi; Massam, Hélène. A Bayesian graphical model for genome-wide association studies (GWAS). Ann. Appl. Stat. 10 (2016), no. 2, 786--811. doi:10.1214/16-AOAS909.

Export citation


  • Anglian Breast Cancer Study Group (2000). Prevalence and penetrance of BRCA1 and BRCA2 in a population based series of breast cancer cases. The British Journal of Cancer 83 1301–1308.
  • Barrett, J. C., Fry, B., Maller, J. and Daly, M. J. (2005). Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 15 263–265.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016a). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” DOI:10.1214/16-AOAS909SUPPA.
  • Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016b). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” DOI:10.1214/16-AOAS909SUPPB.
  • Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016c). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” DOI:10.1214/16-AOAS909SUPPC.
  • Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016d). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” DOI:10.1214/16-AOAS909SUPPD.
  • Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016e). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” DOI:10.1214/16-AOAS909SUPPE.
  • Collaborative Group on Hormonal Factors in Breast Cancer (2002). Breast cancer and breastfeeding: Collaborative reanalysis of individual data from 47 epidemiological studies in 30 countries, including 50302 women with breast cancer and 96973 women without the disease. Lancet 360 187–195.
  • The Breast Cancer Linkage Consortium (1999). Cancer risks in BRCA2 mutation carriers. J. Natl. Cancer Inst. 91 1310–1316.
  • Dellaportas, P. and Forster, J. J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615–633.
  • Devlin, B. and Risch, N. (1995). A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29 311–322.
  • Diamandis, E. P. and Youssef, G. M. (2002). Human tissue kallikreins: A family of new cancer biomarkers. Clinical Chemistry 48 1196–1205.
  • Dobra, A. and Massam, H. (2010). The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors. Stat. Methodol. 7 240–253.
  • Edwards, D. and Havránek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika 72 339–351.
  • Gail, M. H. (2008). Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. J. Natl. Cancer Inst. 100 1037–1041.
  • Guan, Y. and Stephens, M. (2011). Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5 1780–1815.
  • Han, B., Park, M. and Chen, X. W. (2010). A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinformatics 11 Suppl. 3 S5.
  • He, Q. and Lin, D.-Y. (2011). A variable selection method for genome-wide association studies. Bioinformatics 27 1–8.
  • Hindorff, L. A., Sethupathy, P., Junkins, H. A., Ramos, E. M., Mehta, J. P., Collins, F. S. and Manolio, T. A. (2009a). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106 9362–9367.
  • Hindorff, L. A., Junkins, H. A., Hall, P. N., Mehta, J. P. and Manolio, T. A. (2009b). A catalog of published genome-wide association studies. preprint. Available at
  • Hirschhorn, J. N. and Daly, M. J. (2005). Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6 95–108.
  • Hoggart, C. J., Whittaker, J. C., De Iorio, M. D. and Balding, D. J. (2008). Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4 e1000130.
  • Hunter, D. J., Kraft, P., Jacobs, K. B., Cox, D. G., Yeager, M., Hankinson, S. E., Wacholder, S., Wang, Z., Welch, R., Hutchinson, A., Wang, J., Yu, K., Chatterjee, N., Orr, N., Willett, W. C., Colditz, G. A., Ziegler, R. G., Berg, C. D., Buys, S. S., McCarty, C. A., Feigelson, H. S., Calle, E. E., Thun, M. J., Hayes, R. B., Tucker, M., Gerhard, D. S., Joseph, F. F., Jr., Hoover, R. N., Thomas, G. and Chanock, S. J. (2007). A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39 870–874.
  • Jiang, X., Barmada, M. M. and Visweswaran, S. (2010). Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 34 575–581.
  • Kingsmore, S. F., Linquist, I. E., Mudge, J., Gessler, D. D. and Beavis, W. D. (2008). Genome-wide association studies: Progress and potential for drug discovery and development. Nature Reviews 7 221–230.
  • Kruglyak, L. (2008). The road to genome-wide association studies. Nature Genetics 9 314–318.
  • Letac, G. and Massam, H. (2012). Bayes factors and the geometry of discrete hierarchical loglinear models. Ann. Statist. 40 861–890.
  • Li, Y., Willer, C. J., Ding, J., Scheet, P. and Abecasis, G. R. (2010). MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34 816–834.
  • Madigan, D. and Raftery, A. E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Statist. Assoc. 89 1535–1546.
  • Marchini, J., Donnelly, P. and Cardon, L. R. (2005). Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37 413–417.
  • Massam, H., Liu, J. and Dobra, A. (2009). A conjugate prior for discrete hierarchical log-linear models. Ann. Statist. 37 3431–3467.
  • McCarthy, M. I. and Hirschhorn, J. N. (2008). Genome-wide association studies: Potential next steps on a genetic journey. Hum. Mol. Genet. 17 R156–R165.
  • Peto, J. and Mack, T. M. (2000). High constant incidence in twins and other relatives of women with breast cancer. Nat. Genet. 26 411–414.
  • Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38 904–909.
  • Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Maller, J., Sklar, P., de Bakker, P. I. W., Daly, M. J. and Sham, P. C. (2007). PLINK: A toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81 559–575.
  • Risch, N. J. (2000). Searching for genetic determinants in the new millennium. Nature 405 847–856.
  • Risch, N. and Merikangas, K. (1996). The future of genetic studies of complex human diseases. Science 273 1516–1517.
  • Schwartz, D. F., Ziegler, A. and Konig, I. R. (2008). Beyond the results of genome-wide association studies. Genet. Epidemiol. 32 671.
  • Thomas, A. and Camp, N. J. (2004). Graphical modelling of the joint distribution of alleles at associated loci. Am. J. Hum. Genet. 74 1088–1101.
  • Thompson, D. and Easton, D. F. (2004). The genetic epidemiology of breast cancer genes. J. Mammary Gland Biol. Neoplasia 9 221–236.
  • Thompson, D., Easton, D. F. and Breast Cancer Linkage Consortium (2002). Cancer incidence in BRCA1 mutation carriers. J. Natl. Cancer Inst. 94 1358–1365.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Ungvari, I., Hullam, G., Antal, P., Kiszel, P. S., Gezsi, A., Hadadi, É., Virág, V., Hajós, G., Millinghoffer, A., Nagy, A., Kiss, A., Semsei, Á. F., Temesi, G., Melegh, B., Kisfali, P., Széll, M., Bikov, A., Gálffy, G., Tamási, L., Falus, A. and Szalai, C. (2012). Evaluation of a partial genome screening of two asthma susceptibility regions using Bayesian network based Bayesian multilevel analysis of relevance. PLoS One 7 e33573.
  • Verzilli, C. J., Stallard, N. and Whittaker, J. C. (2006). Bayesian graphical models for genomewide association studies. Am. J. Hum. Genet. 79 100–112.
  • Wacholder, S., Hartge, P., Prentice, R., Garcia-Closas, M., Feigelson, H. S., Diver, W. R., Thun, M. J., Cox, D. G., Hankinson, S. E., Kraft, P., Rosner, B., Berg, C. D., Brinton, L. A., Lissowska, J., Sherman, M. E., Chlebowski, R., Kooperberg, C., Jackson, R. D., Buckman, D. W., Hui, P., Pfeiffer, R., Jacobs, K. B., Thomas, G. D., Hoover, R. N., Gail, M. H., Chanock, S. J. and Hunter, D. J. (2010). Performance of common genetic variants in breast-cancer risk models. N. Engl. J. Med. 362 986–993.
  • Wilson, M. A., Iversen, E. S., Clyde, M. A., Schmidler, S. C. and Schildkraut, J. M. (2010). Bayesian model search and multilevel inference for SNP association studies. Ann. Appl. Stat. 4 1342–1364.
  • Wu, Z. and Zhao, H. (2009). Statistical power of model selection strategies for genome-wide association studies. PLoS Genet. 5 e1000582.
  • Xing, H., McDonagh, P. D., Bienkowska, J., Cashorali, T., Runge, K., Miller, R. E., Decaprio, D., Church, B., Roubenoff, R., Khalil, I. G. and Carulli, J. (2011). Causal modeling using network ensemble simulations of genetic and gene expression data predicts genes involved in rheumatoid arthritis. PLoS Comput. Biol. 7 e1001105.
  • Yeung, K. Y., Bumgarner, R. E. and Raftery, A. E. (2005). Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 2394–2402.
  • Zhang, Y. (2012). A novel Bayesian graphical model for genome-wide multi-SNP association mapping. Genet. Epidemiol. 36 36–47.
  • Zhang, Y. and Liu, J. S. (2007). Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39 1167–1173.

Supplemental materials