The Annals of Applied Statistics

Gene-centric gene–gene interaction: A model-based kernel machine method

Shaoyu Li and Yuehua Cui

Full-text: Open access

Abstract

Much of the natural variation for a complex trait can be explained by variation in DNA sequence levels. As part of sequence variation, gene–gene interaction has been ubiquitously observed in nature, where its role in shaping the development of an organism has been broadly recognized. The identification of interactions between genetic factors has been progressively pursued via statistical or machine learning approaches. A large body of currently adopted methods, either parametrically or nonparametrically, predominantly focus on pairwise single marker interaction analysis. As genes are the functional units in living organisms, analysis by focusing on a gene as a system could potentially yield more biologically meaningful results. In this work, we conceptually propose a gene-centric framework for genome-wide gene–gene interaction detection. We treat each gene as a testing unit and derive a model-based kernel machine method for two-dimensional genome-wide scanning of gene–gene interactions. In addition to the biological advantage, our method is statistically appealing because it reduces the number of hypotheses tested in a genome-wide scan. Extensive simulation studies are conducted to evaluate the performance of the method. The utility of the method is further demonstrated with applications to two real data sets. Our method provides a conceptual framework for the identification of gene–gene interactions which could shed novel light on the etiology of complex diseases.

Article information

Source
Ann. Appl. Stat., Volume 6, Number 3 (2012), 1134-1161.

Dates
First available in Project Euclid: 31 August 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1346418577

Digital Object Identifier
doi:10.1214/12-AOAS545

Mathematical Reviews number (MathSciNet)
MR3012524

Zentralblatt MATH identifier
06096525

Keywords
Allele matching kernel association study gene-clustered SNPs genomic similarity reproducing kernel Hilbert space quantitative traits

Citation

Li, Shaoyu; Cui, Yuehua. Gene-centric gene–gene interaction: A model-based kernel machine method. Ann. Appl. Stat. 6 (2012), no. 3, 1134--1161. doi:10.1214/12-AOAS545. https://projecteuclid.org/euclid.aoas/1346418577


Export citation

References

  • Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
  • Breiman, L. (2001). Random forests. Mach. Learn. 45 5–32.
  • Brem, B. B. and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102 1572–1577.
  • Brem, R. B., Yvert, G., Clinton, R. and Kruglyak, L. (2002). Genetic dissection of transcriptional regulation in budding yeast. Science 296 752–755.
  • Buil, A., Martinez-Perez, A., Perera-Lluna, A. et al. (2009). A new gene-based association test for genome-wide association studies. BMC Proc. 3 S130.
  • Chapman, J. and Clayton, D. (2007). Detecting association using epistatic information. Genet. Epidemiol. 31 894–909.
  • Chatterjee, N., Kalaylioglu, Z., Moslehi, R., Peters, U. and Wacholder, S. (2006). Powerful multilocus tests of genetic association in the presence of gene–gene and gene-environment interactions. Am. J. Hum. Genet. 79 1002–1016.
  • Cordell, H. J. (2009). Detecting gene–gene interactions that underlie human diseases. Nat. Rev. Genet. 10 392–404.
  • Cui, Y., Kang, G., Sun, K., Qian, M., Romero, R. and Fu, W. (2008). Gene-centric genomewide association study via entropy. Genetics 179 637–650.
  • Eichler, E. E., Flint, J., Gibson, G., Kong, A., Leal, S. M., Moore, J. H. and Nadeau, J. H. (2010). Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11 446–450.
  • Gu, C. (2002). Smoothing Spline ANOVA Models. Springer, New York.
  • Gu, C. and Wahba, G. (1993). Smoothing spline ANOVA with component-wise Bayesian “confidence intervals”. J. Comput. Graph. Statist. 2 97–117.
  • He, J., Wang, K., Edmondson, A. C. et al. (2010). Gene-based interaction analysis by incorporating external linkage disequilibrium information. Eur. J. Hum. Genet. 19 164.
  • Hudson, R. R. (2002). Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18 337–338.
  • Jorgenson, E. and Witte, J. S. (2006). A gene-centric approach to genome-wide association studies. Nat. Rev. Genet. 7 885–891.
  • Kang, G., Yue, W., Zhang, J., Cui, Y., Zuo, Y. and Zhang, D. (2008). An entropy-based approach for testing genetic epistasis underlying complex diseases. J. Theoret. Biol. 250 362–374.
  • Kwee, L. C., Liu, D., Lin, X., Ghosh, D. and Epstein, M. P. (2008). A powerful and flexible multilocus association test for quantitative traits. Am. J. Hum. Genet. 82 386–397.
  • Li, S., Lu, Q. and Cui, Y. (2010). A systems biology approach for identifying novel pathway regulators in eQTL mapping. J. Biopharm. Statist. 20 373–400.
  • Li, J., Zhang, K. and Yi, N. (2011). A Bayesian hierarchical model for detecting haplotype-haplotype and haplotype-environment interactions in genetic association studies. Hum. Hered. 71 148–160.
  • Li, S., Lu, Q., Fu, W., Romero, R. and Cui, Y. (2009). A regularized regression approach for dissecting genetic conflicts that increase disease risk in pregnancy. Stat. Appl. Genet. Mol. Biol. 8 Art. 45, 28.
  • Li, M., Romero, R., Fu, W. J. and Cui, Y. (2010). Mapping haplotype-haplotype interactions with adaptive LASSO. BMC Genet. 11 79.
  • Liu, D., Lin, X. and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models. Biometrics 63 1079–1088, 1311.
  • Ma, S., Song, X. and Huang, J. (2007). Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics 8 60.
  • Ma, S., Zhang, Y., Huang, J., Han, X., Holford, T., Lan, Q., Rothman, N., Boyle, P. and Zheng, T. (2010). Identification of non-Hodgkin’s lymphoma prognosis signatures using the CTGDR method. Bioinformatics 26 15–21.
  • Maher, B. (2008). Personal genomes: The case of the missing heritability. Nature 456 18–21.
  • Moore, J. H. and Williams, S. M. (2009). Epistasis and its implications for personal genetics. Am. J. Hum. Genet. 85 309–320.
  • Mukhopadhyay, I., Feingold, E., Weeks, D. E. and Thalamuthu, A. (2010). Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genet. Epidemiol. 34 213–221.
  • Neale, B. M. and Sham, P. C. (2004). The future of association studies: Gene-based analysis and replication. Am. J. Hum. Genet. 75 353–362.
  • Nezar, M. A.-S., el Baky, A. M. A., Soliman, O. A.-S., Abdel-Hady, H. A.-S., Hammad, A. M. and Al-Haggar, M. S. (2009). Endothelin-1 and leptin as markers of intrauterine growth restriction. Indian J. Pediatr. 76 485–488.
  • Osorio, M., Torres, J., Moya, F., Pezzullo, J., Salafia, C., Baxter, R., Schwander, J. and Fant, M. (1996). Insulin-like growth factors (IGFs) and IGF binding proteins-1, -2, and -3 in newborn serum: Relationships to fetoplacental growth at term. Early Hum. Dev. 46 15–26.
  • Perlstein, E. O., Ruderfer, D. M., Roberts, D. C., Schreiber, S. L. and Kruglyak, L. (2007). Genetic basis of individual differences in the response to small-molecule drugs in yeast. Nat. Genet. 39 496–502.
  • Piegorsch, W. W., Weinberg, C. R. and Taylor, J. A. (1994). Non-hierarchical logistic models and case-only designs for accessing susceptibility in population-based case-control studies. Stat. Med. 13 153–162.
  • Reigstad, L. J., Varhaug, J. E. and Lillehaug, J. R. (2005). Structural and functional specificities of PDGF-C and PDGF-D, the novel members of the platelet-derived growth factors family. FEBS J. 272 5723–5741.
  • Ritchie, M. D., Hahn, L. W., Roodi, N., Bailey, L. R., Dupont, W. D., Parl, F. F. and Moore, J. H. (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69 138–147.
  • Roy, A., Exinger, F. and Losson, R. (1990). cis- and trans-acting regulatory elements of the yeast URA3 promoter. Mol. Cell. Biol. 10 5257–5270.
  • Schaid, D. J. (2010a). Genomic similarity and kernel methods I: Advancements by building on mathematical and statistical foundations. Hum. Hered. 70 109–131.
  • Schaid, D. J. (2010b). Genomic similarity and kernel methods II: Methods for genomic information. Hum. Hered. 70 132–140.
  • Schaid, D. J., McDonnell, S. K., Hebbring, S. J., Cunningham, J. M. and Thibodeau, S. N. (2005). Nonparametric tests of association of multiple genes with human disease. Am. J. Hum. Genet. 76 780–793.
  • Self, S. G. and Liang, K.-Y. (1987). Large sample properties of the maximum likelihood estimator and the likelihood ratio test on the boundary of the parameter space. J. Amer. Statist. Assoc. 82 605–611.
  • Shannon, P., Markiel, A., Ozier, O. et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 3 2498–2504.
  • Silver, K. L., Zhong, K., Leke, R. G. F., Taylor, D. W. and Kain, K. C. (2010). Dysregulation of angiopoietins is associated with placental malaria and low birth weight. PLoS ONE 5 e9481.
  • Speed, T. (1991). That BLUP is a good thing: The estimation of random effects. Statist. Sci. 6 42–44.
  • Sun, W., Yuan, S. and Li, K.-C. (2008). Trait-trait dynamic interaction: 2D-trait eQTL mapping for genetic variation study. BMC Genomics 9 242.
  • Thornton-Wells, T. A., Moore, J. H. and Haines, J. L. (2004). Genetics, statistics and human disease: Analytical retooling for complexity. Trends Genet. 20 640–647.
  • Torry, D. S., Mukherjea, D., Arroyo, J. and Torry, R. J. (2003). Expression and function of placenta growth factor: Implications for abnormal placentation. J. Soc. Gynecol. Investig. 10 178–188.
  • Tzeng, J. Y., Devlin, B., Wasserman, L. and Roeder, K. (2003). On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. Am. J. Hum. Genet. 72 891–902.
  • Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
  • Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995). Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Ann. Statist. 23 1865–1895.
  • Wang, K. and Abbott, D. (2008). A principal components regression approach to multilocus genetic association studies. Genet. Epidemiol. 32 108–118.
  • Wang, T., Ho, G., Ye, K., Strickler, H. and Elston, R. C. (2009). A partial least-square approach for modeling gene–gene and gene-environment interactions when multiple markers are genotyped. Genet. Epidemiol. 33 6–15.
  • Weeks, D. E. and Lange, K. (1988). The affected-pedigree-member method of linkage analysis. Am. J. Hum. Genet. 42 315–326.
  • Wessel, J. and Schork, N. J. (2006). Generalized genomic distance-based regression methodology for multilocus association analysis. Am. J. Hum. Genet. 79 792–806.
  • Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J. and Lin, X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86 929–942.
  • Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89 82–93.
  • Zhang, Y. and Liu, J. S. (2007). Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39 1167–1173.