The Annals of Statistics

From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation

Tong Zhang

Full-text: Open access


We consider an extension of ɛ-entropy to a KL-divergence based complexity measure for randomized density estimation methods. Based on this extension, we develop a general information-theoretical inequality that measures the statistical complexity of some deterministic and randomized density estimators. Consequences of the new inequality will be presented. In particular, we show that this technique can lead to improvements of some classical results concerning the convergence of minimum description length and Bayesian posterior distributions. Moreover, we are able to derive clean finite-sample convergence bounds that are not obtainable using previous approaches.

Article information

Ann. Statist., Volume 34, Number 5 (2006), 2180-2210.

First available in Project Euclid: 23 January 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C10: Bayesian problems; characterization of Bayes procedures 62G07: Density estimation

Bayesian posterior distribution minimum description length density estimation


Zhang, Tong. From ɛ -entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 (2006), no. 5, 2180--2210. doi:10.1214/009053606000000704.

Export citation


  • Barron, A. and Cover, T. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034--1054.
  • Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536--561.
  • Catoni, O. (2004). A PAC-Bayesian approach to adaptive classification. Available at
  • Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500--531.
  • Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38--53.
  • Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York.
  • Li, J. (1999). Estimation of mixture models. Ph.D. dissertation, Dept. Statistics, Yale Univ.
  • Meir, R. and Zhang, T. (2003). Generalization error bounds for Bayesian mixture algorithms. J. Mach. Learn. Res. 4 839--860.
  • Rényi, A. (1961). On measures of entropy and information. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 547--561. Univ. California Press, Berkeley.
  • Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore.
  • Seeger, M. (2002). PAC-Bayesian generalization error bounds for Gaussian process classification. J. Mach. Learn. Res. 3 233--269.
  • Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687--714.
  • van de Geer, S. (2000). Empirical Processes in $M$-Estimation. Cambridge Univ. Press.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. With Applications to Statistics. Springer, New York.
  • Walker, S. and Hjort, N. (2001). On Bayesian consistency. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 811--821.
  • Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.
  • Zhang, T. (1999). Theoretical analysis of a class of randomized regularization methods. In Proc. Twelfth Annual Conference on Computational Learning Theory 156--163. ACM Press, New York.
  • Zhang, T. (2004). Learning bounds for a generalized family of Bayesian posterior distributions. In Advances in Neural Information Processing Systems 16 (S. Thrun, L. K. Saul and B. Schölkopf, eds.) 1149--1156. MIT Press, Cambridge, MA.