September 2024 Multiscale Poisson process approaches for detecting and estimating differences from high-throughput sequencing assays
Heejung Shim, Zhengrong Xing, Ester Pantaleo, Francesca Luca, Roger Pique-Regi, Matthew Stephens
Author Affiliations +
Ann. Appl. Stat. 18(3): 1773-1788 (September 2024). DOI: 10.1214/23-AOAS1828

Abstract

Estimating and testing for differences in molecular phenotypes (e.g., gene expression, chromatin accessibility, transcription factor binding) across conditions is an important part of understanding the molecular basis of gene regulation. These phenotypes are commonly measured using high-throughput sequencing assays (e.g., RNA-seq, ATAC-seq, ChIP-seq), which provide high-resolution count data that reflect how the phenotypes vary along the genome. Multiple methods have been proposed to help exploit these high-resolution measurements for differential expression analysis. However, they ignore the count nature of the data, instead using normal distributions that work well only for data with large sample sizes or high counts. Here we develop count-based methods to address this problem. We model the data for each sample using an inhomogeneous Poisson process with spatially structured underlying intensity function and then, building on multiscale models for the Poisson process, estimate and test for differences in the underlying intensity function across samples (or groups of samples). Using both simulation and real ATAC-seq data, we show that our method outperforms previous normal-based methods, especially in situations with small sample sizes or low counts.

Funding Statement

This work was supported by NIH Grant HG002585.

Acknowledgements

We thank Jack Degner for invaluable discussions and Yao-ban Chan for helpful comments on a draft manuscript. We also thank the members of the H. Shim, M. Stephens, and J. Pritchard labs for helpful discussions.

Citation

Download Citation

Heejung Shim. Zhengrong Xing. Ester Pantaleo. Francesca Luca. Roger Pique-Regi. Matthew Stephens. "Multiscale Poisson process approaches for detecting and estimating differences from high-throughput sequencing assays." Ann. Appl. Stat. 18 (3) 1773 - 1788, September 2024. https://doi.org/10.1214/23-AOAS1828

Information

Received: 1 June 2021; Revised: 1 September 2023; Published: September 2024
First available in Project Euclid: 5 August 2024

Digital Object Identifier: 10.1214/23-AOAS1828

Keywords: ATAC-seq , Bayesian inference , chromatin accessibility , count data , differential expression analysis , DNase-seq , functional data , high-resolution , high-throughput sequencing assays , Multiscale Poisson processes , RNA-Seq , Wavelets

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 3 • September 2024
Back to Top