Open Access
September 2012 Detecting mutations in mixed sample sequencing data using empirical Bayes
Omkar Muralidharan, Georges Natsoulis, John Bell, Hanlee Ji, Nancy R. Zhang
Ann. Appl. Stat. 6(3): 1047-1067 (September 2012). DOI: 10.1214/12-AOAS538


We develop statistically based methods to detect single nucleotide DNA mutations in next generation sequencing data. Sequencing generates counts of the number of times each base was observed at hundreds of thousands to billions of genome positions in each sample. Using these counts to detect mutations is challenging because mutations may have very low prevalence and sequencing error rates vary dramatically by genome position. The discreteness of sequencing data also creates a difficult multiple testing problem: current false discovery rate methods are designed for continuous data, and work poorly, if at all, on discrete data.

We show that a simple randomization technique lets us use continuous false discovery rate methods on discrete data. Our approach is a useful way to estimate false discovery rates for any collection of discrete test statistics, and is hence not limited to sequencing data. We then use an empirical Bayes model to capture different sources of variation in sequencing error rates. The resulting method outperforms existing detection approaches on example data sets.


Download Citation

Omkar Muralidharan. Georges Natsoulis. John Bell. Hanlee Ji. Nancy R. Zhang. "Detecting mutations in mixed sample sequencing data using empirical Bayes." Ann. Appl. Stat. 6 (3) 1047 - 1067, September 2012.


Published: September 2012
First available in Project Euclid: 31 August 2012

zbMATH: 1254.62114
MathSciNet: MR3012520
Digital Object Identifier: 10.1214/12-AOAS538

Keywords: discrete data , DNA sequencing , Empirical Bayes , False discovery rates , genome variation

Rights: Copyright © 2012 Institute of Mathematical Statistics

Vol.6 • No. 3 • September 2012
Back to Top