Open Access
2020 Model-based clustering with envelopes
Wenjing Wang, Xin Zhang, Qing Mai
Electron. J. Statist. 14(1): 82-109 (2020). DOI: 10.1214/19-EJS1652

Abstract

Clustering analysis is an important unsupervised learning technique in multivariate statistics and machine learning. In this paper, we propose a set of new mixture models called CLEMM (in short for Clustering with Envelope Mixture Models) that is based on the widely used Gaussian mixture model assumptions and the nascent research area of envelope methodology. Formulated mostly for regression models, envelope methodology aims for simultaneous dimension reduction and efficient parameter estimation, and includes a very recent formulation of envelope discriminant subspace for classification and discriminant analysis. Motivated by the envelope discriminant subspace pursuit in classification, we consider parsimonious probabilistic mixture models where the cluster analysis can be improved by projecting the data onto a latent lower-dimensional subspace. The proposed CLEMM framework and the associated envelope-EM algorithms thus provide foundations for envelope methods in unsupervised and semi-supervised learning problems. Numerical studies on simulated data and two benchmark data sets show significant improvement of our propose methods over the classical methods such as Gaussian mixture models, K-means and hierarchical clustering algorithms. An R package is available at https://github.com/kusakehan/CLEMM.

Citation

Download Citation

Wenjing Wang. Xin Zhang. Qing Mai. "Model-based clustering with envelopes." Electron. J. Statist. 14 (1) 82 - 109, 2020. https://doi.org/10.1214/19-EJS1652

Information

Received: 1 December 2018; Published: 2020
First available in Project Euclid: 3 January 2020

zbMATH: 1434.62135
MathSciNet: MR4047595
Digital Object Identifier: 10.1214/19-EJS1652

Keywords: clustering , computational statistics , Dimension reduction , envelope methods , Gaussian mixture models

Vol.14 • No. 1 • 2020
Back to Top