• Bernoulli
  • Volume 15, Number 2 (2009), 475-507.

Estimating the joint distribution of independent categorical variables via model selection

C. Durot, E. Lebarbier, and A.-S. Tocquet

Full-text: Open access


Assume one observes independent categorical variables or, equivalently, one observes the corresponding multinomial variables. Estimating the distribution of the observed sequence amounts to estimating the expectation of the multinomial sequence. A new estimator for this mean is proposed that is nonparametric, non-asymptotic and implementable even for large sequences. It is a penalized least-squares estimator based on wavelets, with a penalization term inspired by papers of Birgé and Massart. The estimator is proved to satisfy an oracle inequality and to be adaptive in the minimax sense over a class of Besov bodies. The method is embedded in a general framework which allows us to recover also an existing method for segmentation. Beyond theoretical results, a simulation study is reported and an application on real data is provided.

Article information

Bernoulli, Volume 15, Number 2 (2009), 475-507.

First available in Project Euclid: 4 May 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

adaptive estimator categorical variable least-squares estimator model selection multinomial variable penalized minimum contrast estimator wavelet


Durot, C.; Lebarbier, E.; Tocquet, A.-S. Estimating the joint distribution of independent categorical variables via model selection. Bernoulli 15 (2009), no. 2, 475--507. doi:10.3150/08-BEJ155.

Export citation


  • [1] Aerts, M. and Veraverbeke, N. (1995). Bootstrapping a nonparametric polytomous regression model., Math. Methods Statist. 4 189–200.
  • [2] Akakpo, N. (2008). Estimating a discrete distribution via histogram selection. Technical report, Univ. Paris Sud, Orsay.
  • [3] Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence., Bernoulli 4 329–375.
  • [4] Birgé, L. and Massart, P. (2000). An adaptive compression algorithm in Besov spaces., Constr. Approx. 16 1–36.
  • [5] Birgé, L. and Massart, P. (2001). Gaussian model selection., J. Eur. Math. Soc. 3 203–268.
  • [6] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection., Probab. Theory Related Fields 138 33–73.
  • [7] Boucheron, S., Lugosi, G. and Massart, P. (2003). Concentration inequalities using the entropy method., Ann. Probab. 31 1583–1614.
  • [8] Braun, J.-V., Braun, R.-K. and Müller, H.-G. (2000). Multiple changepoint fitting via quasilikelihood, with application to dna sequence segmentation., Biometrika 87 301–314.
  • [9] Donoho, D.-A. and Johnstone, I.-M. (1998). Minimax estimation via wavelet shrinkage., Ann. Statist. 26 879–921.
  • [10] Fu, Y.-X. and Curnow, R.-N. (1990). Maximum likelihood estimation of multiple change points., Biometrika 77 563–573.
  • [11] Gey, S. and Lebarbier, E. (2008). Using CART to detect multiple change-points in the mean for large samples. Technical report, Preprint SSB, n12.
  • [12] Hoebeke, M., Nicolas, P. and Bessières, P. (2003). MuGeN: simultaneous exploration of multiple genomes and computer analysis results., Bioinformatics 19 859–864.
  • [13] Lebarbier, E. (2002). Quelques approches pour la détection de ruptures à horizon fini. Ph.D. thesis, Univ. Paris Sud, Orsay.
  • [14] Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection., Signal Processing 85 717–736.
  • [15] Lebarbier, E. and Nédélec, E. (2007). Change-points detection for discrete sequences via model selection. Technical report, Preprint SSB, n9.
  • [16] Massart, P. (2007)., Concentration Inequalities and Model Selection. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23. Lecture Notes in Math. 1896. Berlin: Springer.