Open Access
December 2010 Gamma-based clustering via ordered means with application to gene-expression analysis
Michael A. Newton, Lisa M. Chung
Ann. Statist. 38(6): 3217-3244 (December 2010). DOI: 10.1214/10-AOS805

Abstract

Discrete mixture models provide a well-known basis for effective clustering algorithms, although technical challenges have limited their scope. In the context of gene-expression data analysis, a model is presented that mixes over a finite catalog of structures, each one representing equality and inequality constraints among latent expected values. Computations depend on the probability that independent gamma-distributed variables attain each of their possible orderings. Each ordering event is equivalent to an event in independent negative-binomial random variables, and this finding guides a dynamic-programming calculation. The structuring of mixture-model components according to constraints among latent means leads to strict concavity of the mixture log likelihood. In addition to its beneficial numerical properties, the clustering method shows promising results in an empirical study.

Citation

Download Citation

Michael A. Newton. Lisa M. Chung. "Gamma-based clustering via ordered means with application to gene-expression analysis." Ann. Statist. 38 (6) 3217 - 3244, December 2010. https://doi.org/10.1214/10-AOS805

Information

Published: December 2010
First available in Project Euclid: 20 September 2010

zbMATH: 1233.62002
MathSciNet: MR2766851
Digital Object Identifier: 10.1214/10-AOS805

Subjects:
Primary: 60E15 , 62F99
Secondary: 62P10

Keywords: Gamma ranking , mixture model , next generation sequencing , Poisson embedding , rank probability

Rights: Copyright © 2010 Institute of Mathematical Statistics

Vol.38 • No. 6 • December 2010
Back to Top