The Annals of Applied Statistics

Modeling association in microbial communities with clique loglinear models

Adrian Dobra, Camilo Valdes, Dragana Ajdic, Bertrand Clarke, and Jennifer Clarke

There is a growing awareness of the important roles that microbial communities play in complex biological processes. Modern investigation of these often uses next generation sequencing of metagenomic samples to determine community composition. We propose a statistical technique based on clique loglinear models and Bayes model averaging to identify microbial components in a metagenomic sample at various taxonomic levels that have significant associations. We describe the model class, a stochastic search technique for model selection, and the calculation of estimates of posterior probabilities of interest. We demonstrate our approach using data from the Human Microbiome Project and from a study of the skin microbiome in chronic wound healing. Our technique also identifies significant dependencies among microbial components as evidence of possible microbial syntrophy.

Article information

Ann. Appl. Stat., Volume 13, Number 2 (2019), 931-957.

Received: January 2018
Revised: November 2018
First available in Project Euclid: 17 June 2019

Contingency tables graphical models model selection microbiome next generation sequencing


Dobra, Adrian; Valdes, Camilo; Ajdic, Dragana; Clarke, Bertrand; Clarke, Jennifer. Modeling association in microbial communities with clique loglinear models. Ann. Appl. Stat. 13 (2019), no. 2, 931--957. doi:10.1214/18-AOAS1229.

Supplemental materials

  • Additional proofs, maps, figures and tables. In this online supplementary material, we describe the data that were used. We also present the computational experiments performed, the details of the simulations, and further details on the software that was developed in this article.