The Annals of Applied Statistics

Bayesian group Lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies

Jiahan Li, Zhong Wang, Runze Li, and Rongling Wu

Full-text: Open access

Abstract

Although genome-wide association studies (GWAS) have proven powerful for comprehending the genetic architecture of complex traits, they are challenged by a high dimension of single-nucleotide polymorphisms (SNPs) as predictors, the presence of complex environmental factors, and longitudinal or functional natures of many complex traits or diseases. To address these challenges, we propose a high-dimensional varying-coefficient model for incorporating functional aspects of phenotypic traits into GWAS to formulate a so-called functional GWAS or fGWAS. The Bayesian group lasso and the associated MCMC algorithms are developed to identify significant SNPs and estimate how they affect longitudinal traits through time-varying genetic actions. The model is generalized to analyze the genetic control of complex traits using subject-specific sparse longitudinal data. The statistical properties of the new model are investigated through simulation studies. We use the new model to analyze a real GWAS data set from the Framingham Heart Study, leading to the identification of several significant SNPs associated with age-specific changes of body mass index. The fGWAS model, equipped with the Bayesian group lasso, will provide a useful tool for genetic and developmental analysis of complex traits or diseases.

Article information

Source
Ann. Appl. Stat., Volume 9, Number 2 (2015), 640-664.

Dates
Received: December 2012
Revised: January 2015
First available in Project Euclid: 20 July 2015

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1437397105

Digital Object Identifier
doi:10.1214/15-AOAS808

Mathematical Reviews number (MathSciNet)
MR3371329

Zentralblatt MATH identifier
06499924

Keywords
Bayesian approach group variable selection longitudinal data GWAS

Citation

Li, Jiahan; Wang, Zhong; Li, Runze; Wu, Rongling. Bayesian group Lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies. Ann. Appl. Stat. 9 (2015), no. 2, 640--664. doi:10.1214/15-AOAS808. https://projecteuclid.org/euclid.aoas/1437397105


Export citation

References

  • Cho, S., Kim, H., Oh, S., Kim, K. and Park, T. (2009). Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proc. 3 Suppl 7 S25.
  • Cui, Y., Wu, R., Casella, G. and Zhu, J. (2008). Nonparametric functional mapping of quantitative trait loci underlying programmed cell death. Stat. Appl. Genet. Mol. Biol. 7 Art. 4, 32.
  • Daly, A. K. (2010). Genome-wide association studies in pharmacogenomics. Nat. Rev. Genet. 11 241–246.
  • Das, K., Li, J., Wang, Z., Fu, G., Li, Y., Mauger, D., Li, R. and Wu, R. (2011). A dynamic model for genome-wide association studies. Hum. Genet. 129 629–639.
  • Dawber, T. R., Meadors, G. F. and Moore, F. E. Jr. (1951). Epidemiological approaches to heart disease: The framingham study. Am. J. Publ. Health 41 279–286.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.
  • Filiault, D. L. and Maloof, J. N. (2012). A genome-wide association study identifies variants underlying the arabidopsis thaliana shade avoidance response. PLoS Genet. 8 e1002589.
  • Frayling, T. M. (2007). Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nat. Rev. Genet. 8 657–662.
  • Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren, C. M., Perry, J. R. B., Elliott, K. S., Lango, H., Rayner, N. W., Shields, B., Harries, L. W., Barrett, J. C., Ellard, S., Groves, C. J., Knight, B., Patch, A.-M., Ness, A. R., Ebrahim, S., Lawlor, D. A., Ring, S. M., Ben-Shlomo, Y., Jarvelin, M.-R., Sovio, U., Bennett, A. J., Melzer, D., Ferrucci, L., Loos, R. J. F., Barroso, I., Wareham, N. J., Karpe, F., Owen, K. R., Cardon, L. R., Walker, M., Hitman, G. A., Palmer, C. N. A., Doney, A. S. F., Morris, A. D., Smith, G. D., Hattersley, A. T. and McCarthy, M. I. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316 889–894.
  • Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman & Hall, Boca Raton, FL.
  • Gorlova, O. Y., Amos, C. I., Wang, N. W., Shete, S., Turner, S. T. and Boerwinkle, E. (2003). Genetic linkage and imprinting effects on body mass index in children and young adults. European Journal of Human Genetics 11 425–432.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • He, Q. and Lin, D.-Y. (2011). A variable selection method for genome-wide association studies. Bioinformatics 27 1–8.
  • Jaquish, C. E. (2007). The framingham heart study, on its way to becoming the gold standard for cardiovascular genetic epidemiology? BMC Med. Genet. 8 63.
  • Jood, K., Jern, C., Wilhelmsen, L. and Rosengren, A. (2004). Body mass index in mid-life is associated with a first stroke in men: A prospective population study over 28 years. Stroke 35 2764–2769.
  • Lettre, G. (2011). Recent progress in the study of the genetics of height. Human Genetics 129 465–472.
  • Li, J., Das, K., Fu, G., Li, R. and Wu, R. (2012). Bayesian lasso for genome-wide association studies. Bioinformatics 27 516–523.
  • Li, J., Wang, Z., Li, R. and Wu, R. (2015). Supplement to “Bayesian group Lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies.” DOI:10.1214/15-AOAS808SUPP.
  • Lin, M. and Wu, R. (2006). A joint model for nonparametric functional mapping of longitudinal trajectory and time-to-event. BMC Bioinformatics 7 138.
  • Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
  • Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA.
  • Ma, C. X., Casella, G. and Wu, R. L. (2002). Functional mapping of quantitative trait loci underlying the character process: A theoretical framework. Genetics 161 1751–1762.
  • Michel, S., Liang, L., Depner, M., Klopp, N., Ruether, A., Kumar, A., Schedel, M., Vogelberg, C., von Mutius, E., von Berg, A., Bufe, A., Rietschel, E., Heinzmann, A., Laub, O., Simma, B., Frischer, T., Genuneit, J., Gut, I. G., Schreiber, S., Lathrop, M., Illig, T. and Kabesch, M. (2010). Unifying candidate gene and GWAS approaches in asthma. PLoS ONE 5 e13894.
  • Morgan, A. R., Thompson, J. M., Murphy, R., Black, P. N., Lam, W. J., Ferguson, L. R. and Mitchell, E. A. (2010). Obesity and diabetes genes are associated with being born small for gestational age: Results from the auckland birthweight collaborative study. BMC Medical Genetics 11 125.
  • Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
  • Sandhu, M. S., Weedon, M. N., Fawcett, K. A., Wasson, J., Debenham, S. L., Daly, A., Lango, H., Frayling, T. M., Neumann, R. J., Sherva, R., Blech, I., Pharoah, P. D., Palmer, C. N. A., Kimber, C., Tavendale, R., Morris, A. D., McCarthy, M. I., Walker, M., Hitman, G., Glaser, B., Permutt, M. A., Hattersley, A. T., Wareham, N. J. and Barroso, I. (2007). Common variants in WFS1 confer risk of type 2 diabetes. Nat. Genet. 39 951–953.
  • Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., Erdos, M. R., Stringham, H. M., Chines, P. S., Jackson, A. U., Prokunina-Olsson, L., Ding, C.-J., Swift, A. J., Narisu, N., Hu, T., Pruim, R., Xiao, R., Li, X.-Y., Conneely, K. N., Riebow, N. L., Sprau, A. G., Tong, M., White, P. P., Hetrick, K. N., Barnhart, M. W., Bark, C. W., Goldstein, J. L., Watkins, L., Xiang, F., Saramies, J., Buchanan, T. A., Watanabe, R. M., Valle, T. T., Kinnunen, L., Abecasis, G. R., Pugh, E. W., Doheny, K. F., Bergman, R. N., Tuomilehto, J., Collins, F. S. and Boehnke, M. (2007). A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316 1341–1345.
  • Shuldiner, A. R. et al. (2009). Association of cytochrome P450 2C19 genotype with the antiplatelet effect and clinical efficacy of clopidogrel therapy. J. Am. Med. Assoc. 302 849–857.
  • Steinthorsdottir, V., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Jonsdottir, T., Walters, G. B., Styrkarsdottir, U., Gretarsdottir, S., Emilsson, V., Ghosh, S., Baker, A., Snorradottir, S., Bjarnason, H., Ng, M. C. Y., Hansen, T., Bagger, Y., Wilensky, R. L., Reilly, M. P., Adeyemo, A., Chen, Y., Zhou, J., Gudnason, V., Chen, G., Huang, H., Lashley, K., Doumatey, A., So, W.-Y., Ma, R. C. Y., Andersen, G., Borch-Johnsen, K., Jorgensen, T., van Vliet-Ostaptchouk, J. V., Hofker, M. H., Wijmenga, C., Christiansen, C., Rader, D. J., Rotimi, C., Gurney, M., Chan, J. C. N., Pedersen, O., Sigurdsson, G., Gulcher, J. R., Thorsteinsdottir, U., Kong, A. and Stefansson, K. (2007). A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat. Genet. 39 770–775.
  • Suchocki, T. and Szyda, J. (2011). Statistical modelling of growth using a mixed model with orthogonal polynomials. J. Appl. Genet. 52 95–100.
  • Takeuchi, F., McGinnis, R., Bourgeois, S., Barnes, C., Eriksson, N., Soranzo, N., Whittaker, P., Ranganath, V., Kumanduri, V., McLaren, W., Holm, L., Lindh, J., Rane, A., Wadelius, M. and Deloukas, P. (2009). A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet. 5 e1000433.
  • Teichert, M., Eijgelsheim, M., Rivadeneira, F., Uitterlinden, A. G., van Schaik, R. H. N., Hofman, A., Smet, P. A. G. M. D., van Gelder, T., Visser, L. E. and Stricker, B. H. C. (2009). A genome-wide association study of acenocoumarol maintenance dosage. Hum. Mol. Genet. 18 3758–3768.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Vidal-Puig, A. J., Considine, R. V., Jimenez-Liñan, M., Werman, A., Pories, W. J., Caro, J. F. and Flier, J. S. (1997). Peroxisome proliferator-activated receptor gene expression in human tissues. Effects of obesity, weight loss, and regulation by insulin and glucocorticoids. J. Clin. Invest. 99 2416–2422.
  • Wang, L., Li, H. and Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Amer. Statist. Assoc. 103 1556–1569.
  • Wang, Z., Li, Y., Li, Q. and Wu, R. (2009). Joint functional mapping of quantitative trait loci for HIV-1 and CD4$^+$ dynamics. Int. J. Biostat. 5 Art. 9, 26.
  • Wu, R. and Lin, M. (2006). Functional mapping—How to map and study the genetic architecture of dynamic complex traits. Nature Review Genetics 7 229–237.
  • Wu, R., Ma, C.-X., Lin, M., Wang, Z. and Casella, G. (2004). Functional mapping of quantitative trait loci underlying growth trajectories using a transform-both-sides logistic model. Biometrics 60 729–738.
  • Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. and Lange, K. (2009). Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25 714–721.
  • Xu, Z. and Taylor, J. A. (2009). SNPinfo: Integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 37(suppl 2) W600–W605.
  • Yang, R. and Xu, S. (2007). Bayesian shrinkage analysis of quantitative trait loci for dynamic traits. Genetics 176 1169–1185.
  • Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W., Goddard, M. E. and Visscher, P. M. (2010). Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42 565–569.
  • Yap, J. S., Fan, J. and Wu, R. (2009). Nonparametric modeling of longitudinal covariance structure in functional mappings of quantitative trait loci. Biometrics 65 1068–1077.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
  • Zhang, H. H. and Lin, Y. (2006). Component selection and smoothing for nonparametric regression in exponential families. Statist. Sinica 16 1021–1041.
  • Zhao, W., Chen, Y. Q., Casella, G., Cheverud, J. M. and Wu, R. L. (2005). A nonstationary model for functional mapping of complex traits. Bioinformatics 21 2469–2477.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.

Supplemental materials

  • Convergence diagnostics and summary of parameter estimates. We plot the potential scale reduction factor (PSRF) against iterations and summarize the average estimates, standard errors and mean squared errors (MSEs) of corresponding Legendre coefficients for the first five genetic predictors.