Open Access
September 2007 Hidden Markov Dirichlet process: modeling genetic inference in open ancestral space
Kyung-Ah Sohn, Eric P. Xing
Bayesian Anal. 2(3): 501-527 (September 2007). DOI: 10.1214/07-BA220

Abstract

The problem of inferring the population structure, linkage disequilibrium pattern, and chromosomal recombination hotspots from genetic polymorphism data is essential for understanding the origin and characteristics of genome variations, with important applications to the genetic analysis of disease propensities and other complex traits. Statistical genetic methodologies developed so far mostly address these problems separately using specialized models ranging from coalescence and admixture models for population structures, to hidden Markov models and renewal processes for recombination; but most of these approaches ignore the inherent uncertainty in the genetic complexity (e.g., the number of genetic founders of a population) of the data and the close statistical and biological relationships among objects studied in these problems. We present a new statistical framework called hidden Markov Dirichlet process (HMDP) to jointly model the genetic recombinations among a possibly infinite number of founders and the coalescence-with-mutation events in the resulting genealogies. The HMDP posits that a haplotype of genetic markers is generated by a sequence of recombination events that select an ancestor for each locus from an unbounded set of founders according to a 1st-order Markov transition process. Conjoining this process with a mutation model, our method accommodates both between-lineage recombination and within-lineage sequence variations, and leads to a compact and natural interpretation of the population structure and inheritance process underlying haplotype data. We have developed an efficient sampling algorithm for HMDP based on a two-level nested Pólya urn scheme, and we present experimental results on joint inference of population structure, linkage disequilibrium, and recombination hotspots based on HMDP. On both simulated and real SNP haplotype data, our method performs competitively or significantly better than extant methods in uncovering the recombination hotspots along chromosomal loci; and in addition it also infers the ancestral genetic patterns and offers a highly accurate map of ancestral compositions of modern populations.

Citation

Download Citation

Kyung-Ah Sohn. Eric P. Xing. "Hidden Markov Dirichlet process: modeling genetic inference in open ancestral space." Bayesian Anal. 2 (3) 501 - 527, September 2007. https://doi.org/10.1214/07-BA220

Information

Published: September 2007
First available in Project Euclid: 22 June 2012

zbMATH: 1332.62352
MathSciNet: MR2342173
Digital Object Identifier: 10.1214/07-BA220

Keywords: Dirichlet process , Hidden Markov model , Hierarchical DP , MCMC , population structure , recombination , SNP , statistical genetics

Rights: Copyright © 2007 International Society for Bayesian Analysis

Vol.2 • No. 3 • September 2007
Back to Top