Open Access
March 2019 Bayesian hidden Markov tree models for clustering genes with shared evolutionary history
Yang Li, Shaoyang Ning, Sarah E. Calvo, Vamsi K. Mootha, Jun S. Liu
Ann. Appl. Stat. 13(1): 606-637 (March 2019). DOI: 10.1214/18-AOAS1208


Determination of functions for poorly characterized genes is crucial for understanding biological processes and studying human diseases. Functionally associated genes are often gained and lost together through evolution. Therefore identifying co-evolution of genes can predict functional gene-gene associations. We describe here the full statistical model and computational strategies underlying the original algorithm CLustering by Inferred Models of Evolution (CLIME 1.0) recently reported by us (Cell 158 (2014) 213–225). CLIME 1.0 employs a mixture of tree-structured hidden Markov models for gene evolution process, and a Bayesian model-based clustering algorithm to detect gene modules with shared evolutionary histories (termed evolutionary conserved modules, or ECMs). A Dirichlet process prior was adopted for estimating the number of gene clusters and a Gibbs sampler was developed for posterior sampling. We further developed an extended version, CLIME 1.1, to incorporate the uncertainty on the evolutionary tree structure. By simulation studies and benchmarks on real data sets, we show that CLIME 1.0 and CLIME 1.1 outperform traditional methods that use simple metrics (e.g., the Hamming distance or Pearson correlation) to measure co-evolution between pairs of genes.


Download Citation

Yang Li. Shaoyang Ning. Sarah E. Calvo. Vamsi K. Mootha. Jun S. Liu. "Bayesian hidden Markov tree models for clustering genes with shared evolutionary history." Ann. Appl. Stat. 13 (1) 606 - 637, March 2019.


Received: 1 June 2018; Revised: 1 August 2018; Published: March 2019
First available in Project Euclid: 10 April 2019

zbMATH: 07057441
MathSciNet: MR3937442
Digital Object Identifier: 10.1214/18-AOAS1208

Keywords: Co-evolution , Dirichlet process mixture model , evolutionary history , gene function prediction , tree-structured hidden Markov model

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 1 • March 2019
Back to Top