The Annals of Statistics

Information-theoretic determination of minimax rates of convergence

Yuhong Yang and Andrew Barron

Full-text: Open access


We present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.

Article information

Ann. Statist., Volume 27, Number 5 (1999), 1564-1599.

First available in Project Euclid: 23 September 2004

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation
Secondary: 62B10: Information-theoretic topics [See also 94A17] 62C20: Minimax procedures 94A29: Source coding [See also 68P30]

Minimax risk density estimation metric entropy Kullback-Leibler distance


Yang, Yuhong; Barron, Andrew. Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 (1999), no. 5, 1564--1599. doi:10.1214/aos/1017939142.

Export citation


  • Ball, K. and Pajor, A. (1990). The entropy of convex bodies with "few" extreme points. In Geometry of Banach Spaces 26-32. Cambridge Univ. Press.
  • Barron, A. R. (1987). Are Bayes rules consistent in information? In Open Problems in Communication and Computation (T. M. Cover and B. Gopinath, eds.) 85-91. Springer, New York.
  • Barron, A. R. (1991). Neural net approximation. In Proceedings of the Yale Workshop on Adaptive Learning Systems (K. Narendra, ed.) Yale University.
  • Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39 930-945.
  • Barron, A. R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14 115-133.
  • Barron, A. R., Birg´e, L. and Massart, P. (1999). Riskbounds for model selection via penalization. Probab. Theory Related Fields 113 301-413.
  • Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034-1054.
  • Barron, A. R. and Hengartner, N. (1998). Information theory and superefficiency. Ann. Statist. 26 1800-1825.
  • Barron, A. R. and Sheu, C.-H. (1991). Approximation of density functions by sequences of exponential families. Ann. Statist. 19 1347-1369.
  • Bickel, P. J. and Ritov, Y. (1988). Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhy¯a Ser. A 50 381-393.
  • Birg´e, L. (1983). Approximation dans les espaces metriques et theorie de l'estimation. Z. Wahrsch. Verw. Gebiete 65 181-237.
  • Birg´e, L. (1986). On estimating a density using Hellinger distance and some other strange facts. Probab. Theory Related Fields 71 271-291.
  • Birg´e, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113-150.
  • Birg´e, L. and Massart, P. (1994). Minimum contrast estimators on sieves. Technical report, Univ. Paris-Sud.
  • Birg´e, L. and Massart, P. (1995). Estimation of integral functionals of a density. Ann. Statist. 23 11-29.
  • Birg´e, L. and Massart, P. (1996). From model selection to adaptive estimation. In Research Papers in Probability and Statistics: Festschrift in Honor of Lucien Le Gam (D. Pollard, E. Torgersen and G. Yang, eds.) 55-87. Springer, New York.
  • Birman, M. S. and Solomjak, M. (1974). Quantitative analysis in Sobolev embedding theorems and application to spectral theory. Tenth Math. School Kiev 5-189.
  • Bretagnolle, J. and Huber, C. (1979). Estimation des densites: risque minimax. Z. Wahrsch. Verw. Gebiete 47 119-137.
  • Carl, B. (1981). Entropy numbers of embedding maps between Besov spaces with an application to eigenvalue problems. Proc. Roy. Soc. Edinburgh 90A 63-70.
  • Cencov, N. N. (1972). Statistical Decision Rules and Optimal Inference. Nauka, Moscow; English
  • translation in Amer. Math. Soc. Transl. 53 (1982).
  • Clarke, B. and Barron, A. R. (1990). Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inform. Theory 36 453-471.
  • Clarke, B. and Barron, A. R. (1994). Jeffrey's prior is asymptotically least favorable under entropy risk. J. Statist. Plann. Inference 41 37-60.
  • Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, New York.
  • Cox, D. D (1988). Approximation of least squares regression on nested subspaces. Ann. Statist. 16 713-732.
  • Davisson, L. (1973). Universal noiseless coding. IEEE Trans. Inform. Theory 19 783-795.
  • Davisson, L. and Leon-Garcia, A. (1980). A source matching approach to finding minimax codes. IEEE Trans. Inform. Theory 26 166-174.
  • DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Springer, New York.
  • Devroye, L. (1987). A Course in Density Estimation. Birkh¨auser, Boston.
  • Donoho, D. L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Appl. Comput. Harmon. Anal. 1 100-115.
  • Donoho, D. L. (1996). Unconditional bases and bit-level compression. Technical report 498, Dept. Statistics, Stanford Univ.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24 508-539.
  • Donoho, D. L. and Liu, R. C. (1991). Geometrizing rates of convergence II. Ann. Statist. 19 633-667.
  • Dudley, R. M. (1987). Universal Donsker classes and metric entropy. Ann. Probab. 15 1306-1326.
  • Edmunds, D. E. and Triebel, H. (1987). Entropy numbers and approximation numbers in function spaces. Proc. London Math. Soc. 58 137-152.
  • Efroimovich, S. Yu. and Pinsker, M. S. (1982). Estimation of square-integrable probability density of a random variable. Problemy Peredachi Informatsii 18 19-38.
  • Fano, R. M. (1961). Transmission of Information: A Statistical Theory of Communication. MIT Press.
  • Farrell, R. (1972). On the best obtainable asymptotic rates of convergence in estimation of a density function at a point. Ann. Math. Statist. 43 170-180.
  • Hasminskii, R. Z. (1978). A lower bound on the risks of nonparametric estimates of densities in the uniform metric. Theory Probab. Appl. 23 794-796.
  • Hasminskii, R. Z. and Ibragimov, I. A. (1990). On density estimation in the view of Kolmogorov's ideas in approximation theory. Ann. Statist. 18 999-1010.
  • Haussler, D. (1997). A general minimax result for relative entropy. IEEE Trans. Inform. Theory 40 1276-1280.
  • Haussler, D. and Opper, M. (1997). Mutual information, metric entropy and cumulative relative entropy risk. Ann. Statist. 25 2451-2492.
  • Ibragimov, I. A. and Hasminskii, R. Z. (1977). On the estimation of an infinite-dimensional parameter in Gaussian white noise. Soviet Math. Dokl. 18 1307-1309.
  • Ibragimov, I. A. and Hasminskii, R. Z. (1978). On the capacity in communication by smooth signals. Soviet Math. Dokl. 19 1043-1047.
  • Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural networktraining. Ann. Statist. 20 608-613.
  • Kolmogorov, A. N. and Tihomirov, V. M. (1959). -entropy and -capacity of sets in function spaces. Uspekhi Mat. Nauk 14 3-86.
  • Koo, J. Y. and Kim, W. C. (1996). Wavelet density estimation by approximation of log-densities. Statist. Probab. Lett 26 271-278.
  • Le Cam, L. M. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1, 38-53.
  • Le Cam, L. M. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York.
  • Lorentz, G. G. (1966). Metric entropy and approximation. Bull. Amer. Math. Soc. 72 903-937.
  • Lorentz, G. G., Golitschek, M. v. and Makovoz, Y. (1996). Constructive Approximation: Advanced Problems. Springer, New York.
  • Makovoz, Y. (1996). Random approximants and neural networks. J. Approx. Theory 85 98-109.
  • Mitjagin, B. S. (1961). The approximation dimension and bases in nuclear spaces. Uspekhi Mat. Nauk 16 63-132.
  • Nemirovskii, A. (1985). Nonparametric estimation of smooth regression functions. J. Comput. System. Sci. 23 1-11.
  • Nemirovskii, A., Polyak, B. T. and Tsybakov, A. B. (1985). Rates of convergence of nonparametric estimates of maximum-likelihood type. Probl. Peredachi Inf. 21 17-33.
  • Pollard, D. (1993). Hypercubes and minimax rates of convergence. Preprint.
  • Rissanen, J. (1984). Universal coding, information, prediction, and estimation. IEEE Trans. Inform. Theory 30 629-636.
  • Rissanen, J., Speed, T. and Yu, B. (1992). Density estimation by stochastic complexity. IEEE Trans. Inform. Theory 38 315-323.
  • Smoljak, S. A. (1960). The -entropy of some classes E k s B and W s B in the L2 metric. Dokl. Akad. Nauk SSSR 131 30-33.
  • Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040-1053.
  • Temlyakov, V. N. (1989). Estimation of the asymptotic characteristics of classes of functions with bounded mixed derivative or difference. Trudy Mat. Inst. Steklov 189 162-197.
  • Triebel, H. (1975). Interpolation properties of -entropy and diameters. Geometric characteristics of embedding for function spaces of Sobolev-Besov type. Mat. Sb. 98 27-41.
  • Van de Geer, S. (1990). Hellinger consistency of certain nonparametric maximum likelihood estimates. Ann. Statist. 21 14-44.
  • Wong, W. H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339-362.
  • Yang, Y. (1999). Model selection for nonparametric regression. Statist. Sinica 9 475-500.
  • Yang, Y. (1999). Minimax nonparametric classification I: rates of convergence. IEEE Trans. Inform. Theory 45 2271-2284.
  • Yang, Y. and Barron, A. R. (1997). Information-theoretic determination of minimax rates of convergence. Technical Report 28, Dept. Statistics, Iowa State Univ.
  • Yang, Y. and Barron, A. R. (1998). An asymptotic property of model selection criteria. IEEE Trans. Inform. Theory 44 95-116.
  • Yatracos, Y. G. (1985). Rates of convergence of minimum distance estimators and Kolmogorov's entropy. Ann. Statist. 13 768-774.
  • Yatracos, Y. G. (1988). A lower bound on the error in nonparametric regression type problems. Ann. Statist. 16 1180-1187.
  • Yu, B. (1996). Assouad, Fano, and Le Cam. In Research Papers in Probability and Statistics: Festschrift in Honor of Lucien Le Cam (D. Pollard, E. Torgersen and G. Yang, eds.) 423-435. Springer, New York.