Electronic Journal of Statistics

A comprehensive approach to mode clustering

Yen-Chi Chen, Christopher R. Genovese, and Larry Wasserman

Full-text: Open access


Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction of a density estimator’s modes. We provide several enhancements to mode clustering: (i) a soft variant of cluster assignment, (ii) a measure of connectivity between clusters, (iii) a technique for choosing the bandwidth, (iv) a method for denoising small clusters, and (v) an approach to visualizing the clusters. Combining all these enhancements gives us a complete procedure for clustering in multivariate problems. We also compare mode clustering to other clustering methods in several examples.

Article information

Electron. J. Statist., Volume 10, Number 1 (2016), 210-241.

Received: July 2015
First available in Project Euclid: 17 February 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62G07: Density estimation 62G99: None of the above, but in this section

Kernel density estimation mean shift clustering nonparametric clustering soft clustering visualization


Chen, Yen-Chi; Genovese, Christopher R.; Wasserman, Larry. A comprehensive approach to mode clustering. Electron. J. Statist. 10 (2016), no. 1, 210--241. doi:10.1214/15-EJS1102. https://projecteuclid.org/euclid.ejs/1455715961

Export citation


  • E. Arias-Castro, D. Mason, and B. Pelletier. On the estimation of the gradient lines of a density and the consistency of the mean-shift algoithm. Technical report, IRMAR, 2013.
  • A. Asuncion and D. Newman. Uci machine learning repository, 2007.
  • A. Azzalini and N. Torelli. Clustering via nonparametric density estimation., Statistics and Computing, 17(1):71–80, 2007. ISSN 0960-3174. 10.1007/s11222-006-9010-y
  • A. Banyaga., Lectures on Morse Homology, volume 29. Springer Science & Business Media, 2004.
  • J. Chacon. A population background for nonparametric density-based clustering., arXiv :1408.1381, 2014.
  • J. Chacón and T. Duong. Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting., Electronic Journal of Statistics, 2013.
  • J. Chacón, T. Duong, and M. Wand. Asymptotics for general multivariate kernel density derivative estimators., Statistica Sinica, 2011.
  • J. E. Chacón. Clusters and water flows: A novel approach to modal clustering through morse theory., arXiv preprint arXiv :1212.1384, 2012.
  • M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A. Kowalski, S. Łukasik, and S. Żak. Complete gradient clustering algorithm for features analysis of x-ray images. In, Information Technologies in Biomedicine, pages 15–24. Springer, New York, NY, USA NY, 2010.
  • K. Chaudhuri and S. Dasgupta. Rates of convergence for the cluster tree., NIPS, 2010.
  • F. Chazal, L. Guibas, S. Oudot, and P. Skraba. Persistence-based clustering in riemannian manifolds. In, Proceedings of the 27th Annual ACM Symposium on Computational GEOMETRY, pages 97–106. ACM, 2011.
  • F. Chazal, B. T. Fasy, F. Lecci, B. Michel, A. Rinaldo, and L. Wasserman. Robust topological inference: Distance to a measure and kernel distance., arXiv preprint arXiv :1412.7197, 2014.
  • Y.-C. Chen, C. R. Genovese, R. J. Tibshirani, and L. Wasserman. Nonparametric modal regression., arXiv preprint arXiv :1412.1716, 2014a.
  • Y.-C. Chen, C. R. Genovese, and L. Wasserman. Asymptotic theory for density ridges., arXiv :1406.5663, 2014b.
  • Y.-C. Chen, C. R. Genovese, and L. Wasserman. Generalized mode and ridge estimation., arXiv :1406.1803, 2014c.
  • Y. Cheng. Mean shift, mode seeking, and clustering., IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995.
  • D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis., IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603 –619, may 2002.
  • P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties., Decision Support Systems, 47(4):547–553, 2009.
  • V. De Silva and J. B. Tenenbaum. Sparse multidimensional scaling using landmark points. Technical report, Stanford University, 2004.
  • U. Einmahl and D. M. Mason. Uniform in bandwidth consistency for kernel-type function estimators., The Annals of Statistics, 2005.
  • B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, and A. Singh. Statistical inference for persistent homology: Confidence sets for persistence diagrams., The Annals of Statistics, 2014.
  • M. Forina, C. Armanino, S. Lanteri, and E. Tiscornia. Classification of olive oils from their fatty acid composition., Food Research and Data Analysis, 1983.
  • K. Fukunaga and L. D. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition., IEEE Transactions on Information Theory, 21:32–40, 1975.
  • C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, L. Wasserman, et al. On the path density of a gradient field., The Annals of Statistics, 37(6A) :3236–3271, 2009.
  • C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. Nonparametric ridge estimation., arXiv :1212.5156v1, 2012.
  • E. Gine and A. Guillou. Rates of strong uniform consistency for multivariate kernel density estimators. In, Annales de l’Institut Henri Poincare (B) Probability and Statistics, 2002.
  • M. A. Guest. Morse theory in the 1990’s., arXiv:math/0104155v1, 2001.
  • J. Hartigan., Clustering Algorithms. Wiley and Sons, Hoboken, NJ, 1975.
  • T. Hastie, R. Tibshirani, and J. Friedman., The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001.
  • R. A. Horn and C. R. Johnson., Matrix Analysis. Cambridge, second edition, 2013.
  • L. Hubert and P. Arabie. Comparing partitions., Journal of classification, 2(1):193–218, 1985.
  • S. Kpotufe and U. von Luxburg. Pruning nearest neighbor cluster trees., arXiv preprint arXiv :1105.0540, 2011.
  • J. Li, S. Ray, and B. Lindsay. A nonparametric statistical approach to clustering via mode identification., Journal of Machine Learning Research, 8(8) :1687–1723, 2007.
  • P. Lingras and C. West. Interval set clustering of web users with rough k-means., Journal of Intelligent Information Systems, 2002.
  • G. McLachlan and D. Peel., Finite mixture models. John Wiley & Sons, Hoboken, NJ, 2004.
  • M. Morse. Relations between the critical points of a real function of n independent variables., Transactions of the American Mathematical Society, 27(3):345–396, 1925.
  • M. Morse. The foundations of a theory of the calculus of variations in the large in m-space (second paper)., Transactions of the American Mathematical Society, 32(4):599–631, 1930.
  • R. Nock and F. Nielsen. On weighting clustering., IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006.
  • G. Peters, F. Crespoc, P. Lingrasd, and R. Weber. Soft clustering - fuzzy and rough approaches and their extensions and derivatives., International Journal of Approximate Reasoning, 2013.
  • D. Pollard. New ways to prove central limit theorems., Econometric Theory, 1985.
  • W. M. Rand. Objective criteria for the evaluation of clustering methods., Journal of the American Statistical association, 66(336):846–850, 1971.
  • J. P. Romano. Bootstrapping the mode., Annals of the Institute of Statistical Mathematics, 40(3):565–586, 1988.
  • J. P. Romano et al. On weak convergence and optimality of kernel density estimates of the mode., The Annals of Statistics, 16(2):629–647, 1988.
  • D. W. Scott., Multivariate Density Estimation: Theory, Practice, and Visualization, volume 383. John Wiley & Sons, 2009.
  • S. J. Sheather. Density estimation., Statistical Science, 2004.
  • V. D. Silva and J. B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In, Advances in Neural Information Processing Systems, pages 705–712, 2002.
  • W. Stuetzle. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample., Journal of Classification, 20(1):025–047, 2003. ISSN 0176-4268. 10.1007/s00357-003-0004-6 URL http://dx.doi.org/10.1007/s00357-003-0004-6.
  • M. Talagrand. Newconcentration inequalities in product spaces., Invent. Math, 1996.
  • N. X. Vinh, J. Epps, and J. Bailey. Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In, Proceedings of the 26th Annual International Conference on Machine Learning, pages 1073–1080. ACM, 2009.