Open Access
March 2008 How many clusters?
Peter McCullagh, Jie Yang
Bayesian Anal. 3(1): 101-120 (March 2008). DOI: 10.1214/08-BA304

Abstract

The title poses a deceptively simple question that must be addressed by any statistical model or computational algorithm for the clustering of points. Two distinct interpretations are possible, one connected with the number of clusters in the sample and one with the number in the population. Under suitable conditions, these questions may have essentially the same answer, but it is logically possible for one answer to be finite and the other infinite. This paper reformulates the standard Dirichlet allocation model as a cluster process in such a way that these and related questions can be addressed directly. Our conclusion is that the data are sometimes informative for clustering points in the sample, but they seldom contain much information about parameters such as the number of clusters in the population.

Citation

Download Citation

Peter McCullagh. Jie Yang. "How many clusters?." Bayesian Anal. 3 (1) 101 - 120, March 2008. https://doi.org/10.1214/08-BA304

Information

Published: March 2008
First available in Project Euclid: 22 June 2012

zbMATH: 1330.62033
MathSciNet: MR2383253
Digital Object Identifier: 10.1214/08-BA304

Keywords: cluster process , Dirichlet partition , Gauss-Ewens process , Random sub-clusters , Species-counting model

Rights: Copyright © 2008 International Society for Bayesian Analysis

Vol.3 • No. 1 • March 2008
Back to Top