Open Access
September 2019 Network modelling of topological domains using Hi-C data
Y. X. Rachel Wang, Purnamrita Sarkar, Oana Ursu, Anshul Kundaje, Peter J. Bickel
Ann. Appl. Stat. 13(3): 1511-1536 (September 2019). DOI: 10.1214/19-AOAS1244

Abstract

Chromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. One specific feature of the 3D organization is known as topologically associating domains (TADs), which are densely interacting, contiguous chromatin regions playing important roles in regulating gene expression. A few algorithms have been proposed to detect TADs. In particular, the structure of Hi-C data naturally inspires application of community detection methods. However, one of the drawbacks of community detection is that most methods take exchangeability of the nodes in the network for granted; whereas the nodes in this case, that is, the positions on the chromosomes, are not exchangeable. We propose a network model for detecting TADs using Hi-C data that takes into account this nonexchangeability. In addition, our model explicitly makes use of cell-type specific CTCF binding sites as biological covariates and can be used to identify conserved TADs across multiple cell types. The model leads to a likelihood objective that can be efficiently optimized via relaxation. We also prove that when suitably initialized, this model finds the underlying TAD structure with high probability. Using simulated data, we show the advantages of our method and the caveats of popular community detection methods, such as spectral clustering, in this application. Applying our method to real Hi-C data, we demonstrate the domains identified have desirable epigenetic features and compare them across different cell types.

Citation

Download Citation

Y. X. Rachel Wang. Purnamrita Sarkar. Oana Ursu. Anshul Kundaje. Peter J. Bickel. "Network modelling of topological domains using Hi-C data." Ann. Appl. Stat. 13 (3) 1511 - 1536, September 2019. https://doi.org/10.1214/19-AOAS1244

Information

Received: 1 October 2017; Revised: 1 August 2018; Published: September 2019
First available in Project Euclid: 17 October 2019

zbMATH: 07145966
MathSciNet: MR4019148
Digital Object Identifier: 10.1214/19-AOAS1244

Keywords: Community detection , Hi-C data , network models , topologically associating domains

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 3 • September 2019
Back to Top