Open Access
September 2016 Deconvolution of base pair level RNA-Seq read counts for quantification of transcript expression levels
Han Wu, Yu Zhu
Ann. Appl. Stat. 10(3): 1195-1216 (September 2016). DOI: 10.1214/16-AOAS906


RNA-Seq has emerged as the method of choice for profiling the transcriptomes of organisms. In particular, it aims to quantify the expression levels of transcripts using short nucleotide sequences or short reads generated from RNA-Seq experiments. In real experiments, the label of the transcript, from which each short read is generated, is missing, and short reads are mapped to the genome rather than the transcriptome. Therefore, the quantification of transcript expression levels is an indirect statistical inference problem.

In this article, we propose to use individual exonic base pairs as observation units and, further, to model nonzero as well as zero counts at all base pairs at both the transcript and gene levels. At the transcript level, two-component Poisson mixture distributions are postulated, which gives rise to the Convolution of Poisson mixture (CPM) distribution model at the gene level. The maximum likelihood estimation method equipped with the EM algorithm is used to estimate model parameters and quantify transcript expression levels. We refer to the proposed method as CPM-Seq. Both simulation studies and real data demonstrate the effectiveness of CPM-Seq, showing that CPM-Seq produces more accurate and consistent quantification results than Cufflinks.


Download Citation

Han Wu. Yu Zhu. "Deconvolution of base pair level RNA-Seq read counts for quantification of transcript expression levels." Ann. Appl. Stat. 10 (3) 1195 - 1216, September 2016.


Received: 1 October 2013; Revised: 1 October 2015; Published: September 2016
First available in Project Euclid: 28 September 2016

zbMATH: 06775264
MathSciNet: MR3553222
Digital Object Identifier: 10.1214/16-AOAS906

Keywords: convolution , finite Poisson mixture model , RNA-Seq , transcriptome profiling

Rights: Copyright © 2016 Institute of Mathematical Statistics

Vol.10 • No. 3 • September 2016
Back to Top