The Annals of Applied Statistics

Integrative Model-based clustering of microarray methylation and expression data

Matthias Kormaksson, James G. Booth, Maria E. Figueroa, and Ari Melnick

Full-text: Open access


In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and discover biologically distinct groups. In this article we develop a model-based method for clustering such data. The basis of our method involves the construction of a likelihood for any given partition of the subjects. We introduce cluster specific latent indicators that, along with some standard assumptions, impose a specific mixture distribution on each cluster. Estimation is carried out using the EM algorithm. The methods extend naturally to multiple data types of a similar nature, which leads to an integrated analysis over multiple data platforms, resulting in higher discriminating power.

Article information

Ann. Appl. Stat., Volume 6, Number 3 (2012), 1327-1347.

First available in Project Euclid: 31 August 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Integrative model-based clustering microarray data mixture models EM algorithm methylation expression AML


Kormaksson, Matthias; Booth, James G.; Figueroa, Maria E.; Melnick, Ari. Integrative Model-based clustering of microarray methylation and expression data. Ann. Appl. Stat. 6 (2012), no. 3, 1327--1347. doi:10.1214/11-AOAS533.

Supplemental materials

  • Supplementary material: Simulation and details of EM algorithms. We perform a simulation study to assess the performance of our clustering algorithm in the presence of sparse correlation structure. We also derive the steps involved in maximizing the likelihoods of the several models presented in this paper.