• Bernoulli
  • Volume 24, Number 1 (2018), 271-296.

Sparse oracle inequalities for variable selection via regularized quantization

Clément Levrard

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We give oracle inequalities on procedures which combines quantization and variable selection via a weighted Lasso $k$-means type algorithm. The results are derived for a general family of weights, which can be tuned to size the influence of the variables in different ways. Moreover, these theoretical guarantees are proved to adapt the corresponding sparsity of the optimal codebooks, suggesting that these procedures might be of particular interest in high dimensional settings. Even if there is no sparsity assumption on the optimal codebooks, our procedure is proved to be close to a sparse approximation of the optimal codebooks, as has been done for the Generalized Linear Models in regression. If the optimal codebooks have a sparse support, we also show that this support can be asymptotically recovered, providing an asymptotic consistency rate. These results are illustrated with Gaussian mixture models in arbitrary dimension with sparsity assumptions on the means, which are standard distributions in model-based clustering.

Article information

Bernoulli, Volume 24, Number 1 (2018), 271-296.

Received: April 2015
Revised: May 2016
First available in Project Euclid: 27 July 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

clustering high dimension $k$-means Lasso oracle inequalities sparsity variable selection


Levrard, Clément. Sparse oracle inequalities for variable selection via regularized quantization. Bernoulli 24 (2018), no. 1, 271--296. doi:10.3150/16-BEJ876.

Export citation


  • [1] Antoniadis, A., Brossat, X., Cugliari, J. and Poggi, J.-M. (2013). Clustering functional data using wavelets. Int. J. Wavelets Multiresolut. Inf. Process. 11 1350003, 30.
  • [2] Bach, F.R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179–1225.
  • [3] Biau, G., Devroye, L. and Lugosi, G. (2008). On the performance of clustering in Hilbert spaces. IEEE Trans. Inform. Theory 54 781–790.
  • [4] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer Series in Statistics. Heidelberg: Springer.
  • [5] Chang, X., Wang, Y., Li, R. and Xu, Z. (2014). Sparse $k$-means with $\ell _{\infty}/\ell _{0}$ penalty for high-dimensional data clustering. Available at arXiv:1403.7890.
  • [6] De Soete, G. and Carroll, J.D. (1994). $k$-means clustering in a low-dimensional Euclidean space. In New Approaches in Classification and Data Analysis (E. Diday, Y. Lechevallier, M. Schader, P. Bertrand and B. Burtschy, eds.). Studies in Classification, Data Analysis, and Knowledge Organization. 212–219. Heidelberg: Springer.
  • [7] Fischer, A. (2010). Quantization and clustering with Bregman divergences. J. Multivariate Anal. 101 2207–2221.
  • [8] Gersho, A. and Gray, R.M. (1991). Vector Quantization and Signal Compression. Norwell, MA: Kluwer Academic.
  • [9] Graf, S. and Luschgy, H. (2000). Foundations of Quantization for Probability Distributions. Lecture Notes in Math. 1730. Berlin: Springer.
  • [10] Graf, S., Luschgy, H. and Pagès, G. (2007). Optimal quantizers for Radon random vectors in a Banach space. J. Approx. Theory 144 27–53.
  • [11] Jin, J. and Wang, W. (2014). Important feature PCA for high dimensional clustering. Available at arXiv:1407.5241.
  • [12] Levrard, C. (2015). Nonasymptotic bounds for vector quantization in Hilbert spaces. Ann. Statist. 43 592–619.
  • [13] Levrard, C. (2016). Supplement to “Sparse oracle inequalities for variable selection via regularized quantization.” DOI:10.3150/16-BEJ876SUPP.
  • [14] Lloyd, S.P. (1982). Least squares quantization in PCM. IEEE Trans. Inform. Theory 28 129–137.
  • [15] Massart, P. and Meynet, C. (2011). The Lasso as an $\ell_{1}$-ball model selection procedure. Electron. J. Stat. 5 669–687.
  • [16] Maugis-Rabusseau, C. and Michel, B. (2013). Adaptive density estimation for clustering with Gaussian mixtures. ESAIM Probab. Stat. 17 698–724.
  • [17] Meynet, C. (2013). An $\ell_{1}$-oracle inequality for the Lasso in finite mixture Gaussian regression models. ESAIM Probab. Stat. 17 650–671.
  • [18] Pollard, D. (1981). Strong consistency of $k$-means clustering. Ann. Statist. 9 135–140.
  • [19] Pollard, D. (1982). A central limit theorem for $k$-means clustering. Ann. Probab. 10 919–926.
  • [20] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [21] Steinley, D. and Brusco, M.J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika 73 125–144.
  • [22] Sun, W., Wang, J. and Fang, Y. (2012). Regularized $k$-means clustering of high-dimensional data and its asymptotic consistency. Electron. J. Stat. 6 148–167.
  • [23] Terada, Y. (2014). Strong consistency of reduced $k$-means clustering. Scand. J. Stat. 41 913–931.
  • [24] Terada, Y. (2015). Strong consistency of factorial $k$-means clustering. Ann. Inst. Statist. Math. 67 335–357.
  • [25] Timmerman, M.E., Ceulemans, E., Kiers, H.A.L. and Vichi, M. (2010). Factorial and reduced $k$-means reconsidered. Comput. Statist. Data Anal. 54 1858–1871.
  • [26] van de Geer, S. (2013). Generic chaining and the $\ell_{1}$-penalty. J. Statist. Plann. Inference 143 1001–1012.
  • [27] van de Geer, S.A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
  • [28] Vichi, M. and Kiers, H.A.L. (2001). Factorial $k$-means analysis for two-way data. Comput. Statist. Data Anal. 37 49–64.
  • [29] Witten, D.M. and Tibshirani, R. (2010). A framework for feature selection in clustering. J. Amer. Statist. Assoc. 105 713–726.

Supplemental materials

  • Appendix: Remaining proofs. Due to space constraints, we relegate technical details of the remaining proofs to the supplement [13].