The Annals of Statistics

CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality

Abstract

Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance.

Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the high-dimensional setting are used to establish the optimality of the clustering procedure that is based on the classical EM algorithm.

Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1234-1267.

Dates
First available in Project Euclid: 13 February 2019

https://projecteuclid.org/euclid.aos/1550026835

Digital Object Identifier
doi:10.1214/18-AOS1711

Mathematical Reviews number (MathSciNet)
MR3911111

Zentralblatt MATH identifier
07053507

Subjects
Primary: 62G15: Tolerance and confidence regions
Secondary: 62C20: Minimax procedures 62H35: Image analysis

Citation

Cai, T. Tony; Ma, Jing; Zhang, Linjun. CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality. Ann. Statist. 47 (2019), no. 3, 1234--1267. doi:10.1214/18-AOS1711. https://projecteuclid.org/euclid.aos/1550026835

References

• [1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
• [2] Azizyan, M., Singh, A. and Wasserman, L. (2013). Minimax theory for high-dimensional Gaussian mixtures with sparse mean separation. In NIPS 2139–2147.
• [3] Azizyan, M., Singh, A. and Wasserman, L. A. (2015). Efficient sparse clustering of high-dimensional non-spherical Gaussian mixtures. In AISTATS.
• [4] Balakrishnan, S., Wainwright, M. J. and Yu, B. (2017). Statistical guarantees for the EM algorithm: From population to sample-based analysis. Ann. Statist. 45 77–120.
• [5] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• [6] Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer, New York.
• [7] Bouveyron, C. and Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Comput. Statist. Data Anal. 71 52–78.
• [8] Bradley, P. S., Fayyad, U. M. and Mangasarian, O. L. (1999). Mathematical programming for data mining: Formulations and challenges. INFORMS J. Comput. 11 217–238.
• [9] Cai, T. and Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. J. Amer. Statist. Assoc. 106 1566–1577.
• [10] Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• [11] Cai, T. T., Ma, J. and Zhang, L. (2019). Supplement to “CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality.” DOI:10.1214/18-AOS1711SUPP.
• [12] Cai, T. T. and Zhang, L. (2018). High-dimensional Gaussian copula regression: Adaptive estimation and statistical inference. Statist. Sinica 28 963–993.
• [13] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
• [14] Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis, Vol. 3. Wiley, New York.
• [15] Everitt, B. S. (1981). Finite Mixture Distributions. Wiley, New York.
• [16] Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc. 97 611–631.
• [17] Ge, R., Huang, Q. and Kakade, S. M. (2015). Learning mixtures of Gaussians in high dimensions [extended abstract]. In STOC’15—Proceedings of the 2015 ACM Symposium on Theory of Computing 761–770. ACM, New York.
• [18] Hardt, M. and Price, E. (2015). Tight bounds for learning a mixture of two Gaussians [extended abstract]. In STOC’15—Proceedings of the 2015 ACM Symposium on Theory of Computing 753–760. ACM, New York.
• [19] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
• [20] Jin, C., Zhang, Y., Balakrishnan, S., Wainwright, M. J. and Jordan, M. I. (2016). Local maxima in the likelihood of Gaussian mixture models: Structural results and algorithmic consequences. In NIPS 4116–4124.
• [21] Jin, J., Ke, Z. T. and Wang, W. (2017). Phase transitions for high dimensional clustering and related problems. Ann. Statist. 45 2151–2189.
• [22] Jin, J. and Wang, W. (2016). Influential features PCA for high dimensional clustering. Ann. Statist. 44 2323–2359.
• [23] Lindsay, B. G. (1995). Mixture Models: Theory, Geometry and Applications. NSF–CBMS Regional Conference Series in Probability and Statistics 5, Inst. Math. Statist., Hayward, CA; Amer. Statist. Assoc., Alexandria, VA.
• [24] Mai, Q., Yang, Y. and Zou, H. (2018). Multiclass sparse discriminant analysis. Statist. Sinica. To appear.
• [25] Mai, Q., Zou, H. and Yuan, M. (2012). A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99 29–42.
• [26] Moitra, A. and Valiant, G. (2010). Settling the polynomial learnability of mixtures of Gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science—FOCS 2010 93–102. IEEE Comput. Soc., Los Alamitos, CA.
• [27] Neykov, M., Ning, Y., Liu, J. S. and Liu, H. (2015). A unified theory of confidence regions and testing for high dimensional estimating equations. Preprint. Available at arXiv:1510.08986.
• [28] Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. Ser. A 185 71–110.
• [29] Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195–239.
• [30] Reynolds, D. (2015). Gaussian mixture models. In Encyclopedia of Biometrics 827–832. Springer, New York.
• [31] Scott, A. J. and Symons, M. J. (1971). Clustering methods based on likelihood ratio criteria. Biometrics 27 387–397.
• [32] Tian, L. and Gu, Q. (2017). Communication-efficient distributed sparse linear discriminant analysis. In AISTATS. Proceedings of Machine Learning Research 54 1178–1187.
• [33] Tibshirani, R. and Walther, G. (2005). Cluster validation by prediction strength. J. Comput. Graph. Statist. 14 511–528.
• [34] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York. Revised and extended from the 2004 French original, translated by Vladimir Zaiats.
• [35] Verhaak, R. G. W., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., Miller, C. R., Ding, L., Golub, T., Mesirov, J. P. et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17 98–110.
• [36] Wang, Z., Gu, Q., Ning, Y. and Liu, H. (2015). High dimensional expectation-maximization algorithm: Statistical optimization and asymptotic normality. In NIPS 2521–2529.
• [37] Ward, J. H. Jr. (1963). Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58 236–244.
• [38] Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. J. Amer. Statist. Assoc. 105 713–726.
• [39] Yi, X. and Caramanis, C. (2015). Regularized EM algorithms: A unified framework and statistical guarantees. In NIPS 1567–1575.
• [40] Zhou, H., Pan, W. and Shen, X. (2009). Penalized model-based clustering with unconstrained covariance matrices. Electron. J. Stat. 3 1473–1496.

Supplemental materials

• Supplement to “CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality”. This supplement provides detailed proofs of the Theorem 3.1 and 3.3, which are respectively the upper and lower bounds of the estimation error for $\boldsymbol{\beta}^{*}$. All technical lemmas used throughout the paper are also proved. In addition, extra simulation results are provided in the supplement.