The Annals of Applied Statistics

A statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data

Naim U. Rashid, Wei Sun, and Joseph G. Ibrahim

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Sequencing techniques have been widely used to assess gene expression (i.e., RNA-seq) or the presence of epigenetic features (e.g., DNase-seq to identify open chromatin regions). In contrast to traditional microarray platforms, sequencing data are typically summarized in the form of discrete counts, and they are able to delineate allele-specific signals, which are not available from microarrays. The presence of epigenetic features are often associated with gene expression, both of which have been shown to be affected by DNA polymorphisms. However, joint models with the flexibility to assess interactions between gene expression, epigenetic features and DNA polymorphisms are currently lacking. In this paper, we develop a statistical model to assess the associations between gene expression and epigenetic features using sequencing data, while explicitly modeling the effects of DNA polymorphisms in either an allele-specific or nonallele-specific manner. We show that in doing so we provide the flexibility to detect associations between gene expression and epigenetic features, as well as conditional associations given DNA polymorphisms. We evaluate the performance of our method using simulations and apply our method to study the association between gene expression and the presence of DNase I Hypersensitive sites (DHSs) in HapMap individuals. Our model can be generalized to exploring the relationships between DNA polymorphisms and any two types of sequencing experiments, a useful feature as the variety of sequencing experiments continue to expand.

Article information

Ann. Appl. Stat., Volume 10, Number 4 (2016), 2254-2273.

Received: September 2014
Revised: July 2016
First available in Project Euclid: 5 January 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bivariate binomial logistic-normal (BBLN) distribution bivariate Poisson log-normal (BPLN) distribution DNase-seq genetics genomics RNA-seq


Rashid, Naim U.; Sun, Wei; Ibrahim, Joseph G. A statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data. Ann. Appl. Stat. 10 (2016), no. 4, 2254--2273. doi:10.1214/16-AOAS973.

Export citation


  • 1000 Genomes Project Consortium, Abecasis, G. R., Auton, A., Brooks, L. D., DePristo, M. A., Durbin, R. M., Handsaker, R. E., Kang, H. M., Marth, G. T. and McVean, G. A. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491 56–65.
  • Aitchison, J. and Ho, C.-H. (1989). The multivariate Poisson-log normal distribution. Biometrika 76 643–653.
  • Bulmer, M. G. (1974). On fitting the Poisson lognormal distribution to species-abundance data. Biometrics 101–110.
  • Cowper-Sal, R., Zhang, X., Wright, J. B., Bailey, S. D., Cole, M. D., Eeckhoute, J., Moore, J. H., Lupien, M. et al. (2012). Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet. 44 1191–1198.
  • Dabney, A. and Storey, J. D. (2015). qvalue: Q-value estimation for false discovery rate control. R package Version 1.38.0.
  • Danaher, P. J. and Hardie, B. G. S. (2005). Bacon with your eggs? Applications of a new bivariate beta-binomial distribution. Amer. Statist. 59 282–286.
  • Degner, J. F., Pai, A. A., Pique-Regi, R., Veyrieras, J. B., Gaffney, D. J., Pickrell, J. K., De Leon, S., Michelini, K., Lewellen, N., Crawford, G. E. et al. (2012). DNaseI sensitivity QTLs are a major determinant of human expression variation. Nature 482 390–394.
  • Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F. et al. (2012). Landscape of transcription in human cells. Nature 489 101–108.
  • Famoye, F. (2010). On the bivariate negative binomial regression model. J. Appl. Stat. 37 969–981.
  • Fang, F., Hodges, E., Molaro, A., Dean, M., Hannon, G. J. and Smith, A. D. (2012). Genomic landscape of human allele-specific DNA methylation. Proc. Natl. Acad. Sci. USA 109 7332–7337.
  • Gallopin, M., Rau, A., Jaffrézic, F. and Chen, L. (2013). A hierarchical Poisson log-normal model for network inference from rna sequencing data. PLoS ONE 8.
  • Hartzel, J., Agresti, A. and Caffo, B. (2001). Multinomial logit random effects models. Stat. Model. 1 81–102.
  • Heintzman, N. D., Hon, G. C., Hawkins, R. D., Kheradpour, P., Stark, A., Harp, L. F., Ye, Z., Lee, L. K., Stuart, R. K., Ching, C. W., Ching, K. a., Antosiewicz-Bourget, J. E., Liu, H., Zhang, X., Green, R. D., Lobanenkov, V. V., Stewart, R., Thomson, J. a., Crawford, G. E., Kellis, M. and Ren, B. (2009). Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459 108–12.
  • Jaenisch, R. and Bird, A. (2003). Epigenetic regulation of gene expression: How the genome integrates intrinsic and environmental signals. Nat. Genet. 33 Suppl 245–254.
  • Li, Y., Willer, C. J., Ding, J., Scheet, P. and Abecasis, G. R. (2010). MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34 816–834.
  • Liu, Q. and Pierce, D. A. (1994). A note on Gauss-Hermite quadrature. Biometrika 81 624–629.
  • Ma, J., Kockelman, K. M. and Damien, P. (2008). A multivariate Poisson-lognormal regression model for prediction of crash counts by severity, using Bayesian methods. Accident Anal. Prev. 40 964–975.
  • Mavrommatis, E., Arslan, A. D., Sassano, A., Hua, Y., Kroczynska, B. and Platanias, L. C. (2013). Expression and regulatory effects of murine Schlafen (Slfn) genes in malignant melanoma and renal cell carcinoma. J. Biol. Chem. 288 33006–33015.
  • McDaniell, R., Lee, B.-K., Song, L., Liu, Z., Boyle, A. P., Erdos, M. R., Scott, L. J., Morken, M. A., Kucera, K. S., Battenhouse, A. et al. (2010). Heritable individual-specific and allele-specific chromatin signatures in humans. Science 328 235–239.
  • Nyholt, D. R. (2004). A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74 765–769.
  • Park, E. and Lord, D. (2007). Multivariate Poisson-lognormal models for jointly modeling crash frequency by severity. Transp. Res. Rec. 2019 1–6.
  • Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y. and Pritchard, J. K. (2010). Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464 768–772.
  • Quinlan, A. R. and Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26 841–842.
  • Rashid, N. U., Sun, W. and Ibrahim, J. G. (2016). Supplement to “A statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data.” DOI:10.1214/16-AOAS973SUPP.
  • Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N. et al. (2011). AlleleSeq: Analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 7.
  • Song, L., Zhang, Z., Grasfeder, L. L., Boyle, A. P., Giresi, P. G., Lee, B. K., Sheffield, N. C., Gräf, S., Huss, M., Keefe, D. et al. (2011). Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 21 1757–1767.
  • Sun, W. (2012). A statistical framework for eQTL mapping using RNA-seq data. Biometrics 68 1–11.
  • Sun, W., Yu, T. and Li, K.-C. (2007). Detection of eQTL modules mediated by activity levels of transcription factors. Bioinformatics 23 2290–2297.
  • Sun, W., Liu, Y., Crowley, J. J., Chen, T. H., Zhou, H., Chu, H., Huang, S., Kuan, P. F., Li, Y., Miller, D., Shaw, G., Wu, Y., Zhabotynsky, V., McMillan, L., Zou, F., Sullivan, P. F. and Pardo-Manuel de Villena, F. (2015). IsoDOT detects differential RNA-isoform usage with respect to a categorical or continuous covariate with high sensitivity and specificity. J. Amer. Statist. Assoc. 110 975–986.
  • Thurman, R. E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M. T., Haugen, E., Sheffield, N. C., Stergachis, A. B., Wang, H., Vernot, B. et al. (2012). The accessible chromatin landscape of the human genome. Nature 489 75–82.
  • Trapnell, C., Pachter, L. and Salzberg, S. L. (2009). TopHat: Discovering splice junctions with RNA-seq. Bioinformatics 25 1105–1111.

Supplemental materials

  • Supplement to “A Statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data”. Contains details on numerical maximization procedures for the BBLN and BPLN models.