Open Access
February 2011 Statistical Modeling of RNA-Seq Data
Julia Salzman, Hui Jiang, Wing Hung Wong
Statist. Sci. 26(1): 62-83 (February 2011). DOI: 10.1214/10-STS343

Abstract

Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.

Citation

Download Citation

Julia Salzman. Hui Jiang. Wing Hung Wong. "Statistical Modeling of RNA-Seq Data." Statist. Sci. 26 (1) 62 - 83, February 2011. https://doi.org/10.1214/10-STS343

Information

Published: February 2011
First available in Project Euclid: 9 June 2011

zbMATH: 1219.62173
MathSciNet: MR2849910
Digital Object Identifier: 10.1214/10-STS343

Keywords: Fisher information , Isoform abundance estimation , minimal sufficiency , Paired end RNA-Seq data analysis

Rights: Copyright © 2011 Institute of Mathematical Statistics

Vol.26 • No. 1 • February 2011
Back to Top