Open Access
June 2012 Improving sequence-based genotype calls with linkage disequilibrium and pedigree information
Baiyu Zhou, Alice S. Whittemore
Ann. Appl. Stat. 6(2): 457-475 (June 2012). DOI: 10.1214/11-AOAS527


Whole and targeted sequencing of human genomes is a promising, increasingly feasible tool for discovering genetic contributions to risk of complex diseases. A key step is calling an individual’s genotype from the multiple aligned short read sequences of his DNA, each of which is subject to nucleotide read error. Current methods are designed to call genotypes separately at each locus from the sequence data of unrelated individuals. Here we propose likelihood-based methods that improve calling accuracy by exploiting two features of sequence data. The first is the linkage disequilibrium (LD) between nearby SNPs. The second is the Mendelian pedigree information available when related individuals are sequenced. In both cases the likelihood involves the probabilities of read variant counts given genotypes, summed over the unobserved genotypes. Parameters governing the prior genotype distribution and the read error rates can be estimated either from the sequence data itself or from external reference data. We use simulations and synthetic read data based on the 1000 Genomes Project to evaluate the performance of the proposed methods. An R-program to apply the methods to small families is freely available at


Download Citation

Baiyu Zhou. Alice S. Whittemore. "Improving sequence-based genotype calls with linkage disequilibrium and pedigree information." Ann. Appl. Stat. 6 (2) 457 - 475, June 2012.


Published: June 2012
First available in Project Euclid: 11 June 2012

zbMATH: 1243.62138
MathSciNet: MR2976478
Digital Object Identifier: 10.1214/11-AOAS527

Keywords: Genotype calls , human genome sequencing , linkage disequilibrium , pedigrees

Rights: Copyright © 2012 Institute of Mathematical Statistics

Vol.6 • No. 2 • June 2012
Back to Top