Bayesian Analysis

Nonparametric Bayesian Bi-Clustering for Next Generation Sequencing Count Data

Yanxun Xu, Juhee Lee, Yuan Yuan, Riten Mitra, Shoudan Liang, Peter Müller, and Yuan Ji

Full-text: Open access


Histone modifications (HMs) play important roles in transcription through post-translational modifications. Combinations of HMs, known as chromatin signatures, encode specific messages for gene regulation. We therefore expect that inference on possible clustering of HMs and an annotation of genomic locations on the basis of such clustering can contribute new insights about the functions of regulatory elements and their relationships to combinations of HMs. We propose a nonparametric Bayesian local clustering Poisson model (NoB-LCP) to facilitate posterior inference on two-dimensional clustering of HMs and genomic locations. The NoB-LCP clusters HMs into HM sets and lets each HM set define its own clustering of genomic locations. Furthermore, it probabilistically excludes HMs and genomic locations that are irrelevant to clustering. By doing so, the proposed model effectively identifies important sets of HMs and groups regulatory elements with similar functionality based on HM patterns.

Article information

Bayesian Anal., Volume 8, Number 4 (2013), 759-780.

First available in Project Euclid: 4 December 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

ChIP-Seq Histone modifications Nonparametric Bayes Bi-Clustering Markov chain Monte Carlo


Xu, Yanxun; Lee, Juhee; Yuan, Yuan; Mitra, Riten; Liang, Shoudan; Müller, Peter; Ji, Yuan. Nonparametric Bayesian Bi-Clustering for Next Generation Sequencing Count Data. Bayesian Anal. 8 (2013), no. 4, 759--780. doi:10.1214/13-BA822.

Export citation


  • Andersson, R., Enroth, S., Rada-Iglesias, A., Wadelius, C., and Komorowski, J. (2009). “Nucleosomes are well positioned in exons and carry characteristic histone modifications.” Genome research, 19(10): 1732–1741.
  • Bao, L., Zhou, M., and Cui, Y. (2008). “CTCFBSDB: a CTCF-binding site database for characterization of vertebrate genomic insulators.” Nucleic acids research, 36(suppl 1): D83–D87.
  • Barski, A., Cuddapah, S., Cui, K., Roh, T., Schones, D., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). “High-resolution profiling of histone methylations in the human genome.” Cell, 129(4): 823–837.
  • Bernstein, B., Mikkelsen, T., Xie, X., Kamal, M., Huebert, D., Cuff, J., Fry, B., Meissner, A., Wernig, M., Plath, K., et al. (2006). “A bivalent chromatin structure marks key developmental genes in embryonic stem cells.” Cell, 125(2): 315–326.
  • Bernstein, B. E., Humphrey, E. L., Erlich, R. L., Schneider, R., Bouman, P., Liu, J. S., Kouzarides, T., and Schreiber, S. L. (2002). “Methylation of histone H3 Lys 4 in coding regions of active genes.” Proceedings of the National Academy of Sciences, 99(13): 8695–8700.
  • Carlin, B. and Chib, S. (1995). “Bayesian model choice via Markov chain Monte Carlo methods.” Journal of the Royal Statistical Society. Series B (Methodological), 473–484.
  • Cheng, Y. and Church, G. (2000). “Biclustering of expression data.” In Proceedings of the eighth international conference on intelligent systems for molecular biology, volume 1, 93–103.
  • Dahl, D. (2006). “Model-based clustering for expression data via a Dirichlet process mixture model.” In Vannucci, M., Do, K.-A., and Müller, P. (eds.), Bayesian inference for gene expression and proteomics, 201–215. Cambridge: Cambridge University Press.
  • Fujita, P., Rhead, B., Zweig, A., Hinrichs, A., Karolchik, D., Cline, M., Goldman, M., Barber, G., Clawson, H., Coelho, A., et al. (2011). “The UCSC genome browser database: update 2011.” Nucleic acids research, 39(suppl 1): D876–D882.
  • Getz, G., Levine, E., and Domany, E. (2000). “Coupled two-way clustering analysis of gene microarray data.” Proceedings of the National Academy of Sciences, 97(22): 12079–12084.
  • Griffiths, T. L. and Ghahramani, Z. (2005). “Infinite Latent Feature Models and the Indian Buffet Process.” In In NIPS, 475–482. MIT Press.
  • Heintzman, N., Hon, G., Hawkins, R., Kheradpour, P., Stark, A., Harp, L., Ye, Z., Lee, L., Stuart, R., Ching, C., et al. (2009). “Histone modifications at human enhancers reflect global cell-type-specific gene expression.” Nature, 459(7243): 108–112.
  • Heintzman, N., Stuart, R., Hon, G., Fu, Y., Ching, C., Hawkins, R., Barrera, L., Van Calcar, S., Qu, C., Ching, K., et al. (2007). “Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome.” Nature genetics, 39(3): 311–318.
  • Kurdistani, S. (2007). “Histone modifications as markers of cancer prognosis: a cellular view.” British journal of cancer, 97(1): 1–5.
  • — (2011). “Histone modifications in cancer biology and prognosis.” Epigenetics and Disease, 91–106.
  • Lazzeroni, L. and Owen, A. (2002). “Plaid models for gene expression data.” Statistica Sinica, 12(1): 61–86.
  • Lee, J., Müller, P., Zhu, Y., and Ji, Y. (2013a). “A nonparametric Bayesian model for local clustering with Application to Proteomics.” Journal of the American Statistical Association, to appear.
  • Lee, J., Quintana, F., Müller, P., and Trippa, L. (2013b). “Defining Predictive Probability Functions for Species Sampling Models.” Statistical Science, to appear.
  • Li, G., Ma, Q., Tang, H., Paterson, A., and Xu, Y. (2009). “QUBIC: a qualitative biclustering algorithm for analyses of gene expression data.” Nucleic acids research, 37(15): e101–e101.
  • Medvedovic, M., Yeung, K., and Bumgarner, R. (2004). “Bayesian mixture model based clustering of replicated microarray data.” Bioinformatics, 20(8): 1222–1232.
  • Roh, T., Cuddapah, S., and Zhao, K. (2005). “Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping.” Genes and development, 19(5): 542–552.
  • Scott, J. and Berger, J. (2010). “Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem.” The Annals of Statistics, 38(5): 2587–2619.
  • Sivaganesan, S., Laud, P., and Müller, P. (2011). “A Bayesian subgroup analysis with a zero-enriched Polya Urn scheme.” Statistics in Medicine, 30(4): 312–323.
  • Turner, H., Bailey, T., and Krzanowski, W. (2005). “Improved biclustering of microarray data demonstrated through systematic performance tests.” Computational statistics and data analysis, 48(2): 235–254.
  • Wang, X., Xuan, Z., Zhao, X., Li, Y., and Zhang, M. (2009). “High-resolution human core-promoter prediction with CoreBoost_HM.” Genome research, 19(2): 266–275.
  • Wang, Z., Zang, C., Rosenfeld, J., Schones, D., Barski, A., Cuddapah, S., Cui, K., Roh, T., Peng, W., Zhang, M., et al. (2008). “Combinatorial patterns of histone acetylations and methylations in the human genome.” Nature genetics, 40(7): 897–903.
  • Weishaupt, H., Sigvardsson, M., and Attema, J. L. (2010). “Epigenetic chromatin states uniquely define the developmental plasticity of murine hematopoietic stem cells.” Blood, 115(2): 247–256.
  • Zang, C., Schones, D., Zeng, C., Cui, K., Zhao, K., and Peng, W. (2009). “A clustering approach for identification of enriched domains from histone modification ChIP-Seq data.” Bioinformatics, 25(15): 1952–1958.