The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 10, Number 3 (2016), 1217-1244.
Gene-proximity models for genome-wide association studies
Motivated by the important problem of detecting association between genetic markers and binary traits in genome-wide association studies, we present a novel Bayesian model that establishes a hierarchy between markers and genes by defining weights according to gene lengths and distances from genes to markers. The proposed hierarchical model uses these weights to define unique prior probabilities of association for markers based on their proximities to genes that are believed to be relevant to the trait of interest. We use an expectation-maximization algorithm in a filtering step to first reduce the dimensionality of the data and then sample from the posterior distribution of the model parameters to estimate posterior probabilities of association for the markers. We offer practical and meaningful guidelines for the selection of the model tuning parameters and propose a pipeline that exploits a singular value decomposition on the raw data to make our model run efficiently on large data sets. We demonstrate the performance of the model in simulation studies and conclude by discussing the results of a case study using a real-world data set provided by the Wellcome Trust Case Control Consortium.
Ann. Appl. Stat., Volume 10, Number 3 (2016), 1217-1244.
Received: October 2013
Revised: September 2015
First available in Project Euclid: 28 September 2016
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Johnston, Ian; Hancock, Timothy; Mamitsuka, Hiroshi; Carvalho, Luis. Gene-proximity models for genome-wide association studies. Ann. Appl. Stat. 10 (2016), no. 3, 1217--1244. doi:10.1214/16-AOAS907. https://projecteuclid.org/euclid.aoas/1475069606
- Extended results tables and figures. We provide figures and tables to summarize the results of additional simulation studies with less stringent effect sizes as well as the findings in our case study.