• Bernoulli
  • Volume 4, Number 3 (1998), 329-375.

Minimum contrast estimators on sieves: exponential bounds and rates of convergence

Lucien Birgé and Pascal Massart

Full-text: Open access


This paper, which we dedicate to Lucien Le Cam for his seventieth birthday, has been written in the spirit of his pioneering works on the relationships between the metric structure of the parameter space and the rate of convergence of optimal estimators. It has been written in his honour as a contribution to his theory. It contains further developments of the theory of minimum contrast estimators elaborated in a previous paper. We focus on minimum contrast estimators on sieves. By a `sieve' we mean some approximating space of the set of parameters. The sieves which are commonly used in practice are D-dimensional linear spaces generated by some basis: piecewise polynomials, wavelets, Fourier, etc. It was recently pointed out that nonlinear sieves should also be considered since they provide better spatial adaptation (think of histograms built from any partition of D subintervals of [0,1] as a typical example). We introduce some metric assumptions which are closely related to the notion of finite-dimensional metric space in the sense of Le Cam. These assumptions are satisfied by the examples of practical interest and allow us to compute sharp rates of convergence for minimum contrast estimators.

Article information

Bernoulli, Volume 4, Number 3 (1998), 329-375.

First available in Project Euclid: 19 March 2007

Permanent link to this document

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

empirical processes finite-dimensional metric space maximum likelihood estimation minimum contrast estimators nonparametric estimation rates of convergence sieves


Birgé, Lucien; Massart, Pascal. Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 (1998), no. 3, 329--375.

Export citation


  • [1] Assouad, P. (1983) Deux remarques sur l'estimation. C. R. Acad. Sci. Paris Sér. I Math., 296, 1021-1024.
  • [2] Bahadur, R.R. (1958) Examples of inconsistency of maximum likelihood estimates. Sankhyá Ser. A, 20, 207-210.
  • [3] Barron, A.R. (1994) Approximation and estimation bounds for artificial neural networks. Mach. Learning, 14, 115-133.
  • [4] Barron, A.R. and Sheu C.-H. (1991) Approximation of density functions by sequences of exponential families. Ann. Statist., 19, 1347-1369.
  • [5] Barron, A.R., Birgé, L. and Massart, P. (1997) Risk bounds for model selection via penalization. Probab. Theory Related Fields. To appear.
  • [6] Birgé, L. (1983) Approximation dans les espaces métriques et théorie de l'estimation. Z. Wahrscheinlichkeitstheorie Verw. Geb., 65, 181-237.
  • [7] Birgé, L. (1986) On estimating a density using Hellinger distance and some other strange facts. Probab. Theory Related Fields, 71, 271-291.
  • [8] Birgé, L. and Massart, P. (1993) Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields, 97, 113-150.
  • [9] Birgé, L. and Massart, P. (1997) From model selection to adaptive estimation. In D. Pollard, E. Torgersen and G. Yang (eds), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, pp. 55-87. New York: Springer-Verlag.
  • [10] Birman, M.S. and Solomjak, M.Z. (1967) Piecewise-polynomial approximation of functions of the classes Wp. Mat. Sb., 73, 295-317.
  • [11] Cencov, N.N. (1962) Evaluation of an unknown distribution density from observations. Soviet Math., 3, 1559-1562.
  • [12] Chow, Y.-S. and Grenander, U. (1985) A sieve method for the spectral density. Ann. Statist., 13, 998-1010.
  • [13] Cox, D.D. (1988) Approximation of least squares regression on nested subspaces. Ann. Statist., 16, 713-732.
  • [14] DeVore, R.A. and Lorentz, G.G. (1993) Constructive Approximation. Berlin: Springer-Verlag.
  • [15] DeVore, R.A., Jawerth, B. and Popov, V. (1992) Compression of wavelets decompositions. Amer. J. Math., 114, 737-785.
  • [16] Donoho, D.L. and Johnstone, I.M. (1994) Ideal spatial adaptation via wavelet shrinkage. Biometrika, 81, 425-455.
  • [17] Donoho, D.L. and Johnstone, I.M. (1995) Minimax estimation via wavelet shrinkage. Ann. Statist. To appear.
  • [18] Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Picard, D. (1996) Density estimation by wavelet thresholding. Ann. Statist., 24, 508-539.
  • [19] Dudley, R.M. (1978) Central limit theorems for empirical measures. Ann. Probab., 6, 899-929.
  • [20] Dudley, R.M. (1984) A course on empirical processes. In École d'Été de Probabilités de Saint-Flour XII - 1982, Lecture Notes in Math. 1097. Berlin: Springer-Verlag.
  • [21] Feller, W. (1968) An Introduction to Probability Theory and its Applications Vol I, 3rd edn. New York: Wiley.
  • [22] Geman, S. (1981) Sieves for nonparametric estimation of densities and regression. Rep. Pattern Analysis, no. 99. DAM, Brown University.
  • [23] Geman, S. and Hwang, C.-R. (1982) Nonparametric maximum likelihood estimation by the method of sieves. Ann. Statist., 10, 401-414.
  • [24] Grenander, U. (1981) Abstract Inference. New York: Wiley.
  • [25] Le Cam, L.M. (1973) Convergence of estimates under dimensionality restrictions. Ann. Statist., 1, 38-53.
  • [26] Le Cam, L.M. (1975) On local and global properties in the theory or asymptotic normality of experiments. In M. Puri (ed.), Stochastic Processes and Related Topics, Vol. 1, pp. 13-54. New York: Academic Press.
  • [27] Le Cam, L.M. (1986) Asymptotic Methods in Statistical Decision Theory. New York: Springer-Verlag.
  • [28] Le Cam, L.M. and Yang, G.L. (1990) Asymptotics in Statistics: Some Basic Concepts. New York: Springer-Verlag.
  • [29] Ledoux, M. (1996) On Talagrand's deviation inequalities for product measures. ESAIM: Probab. Statist., 1, 63-87.
  • [30] Meyer, Y. (1990) Ondelettes et Opérateurs I. Paris: Hermann.
  • [31] Nemirovskii, A.S., Polyak, B.T. and Tsybakov, A.B. (1984) Signal processing by the nonparametric maximum-likelihood method. Problems Inform. Transmission, 20, 177-192.
  • [32] Ossiander, M. (1987) A central limit theorem under metric entropy with L2 bracketing (1987). Ann. Probab., 15, 897-919.
  • [33] Shen, X. and Wong, W.H. (1994) Convergence rates of sieve estimates, Ann. Statist., 22, 580-615.
  • [34] Stone, C.J. (1982) Optimal rates of convergence for nonparametric regression. Ann. Statist., 10, 1040-1053.
  • [35] Stone, C.J. (1990) Large-sample inference for log-spline models. Ann. Statist., 18, 717-741.
  • [36] Stone, C.J. (1994) The use of polynomial splines and their tensor products in multivariate function estimation. Ann. Statist., 22, 118-184.
  • [37] Talagrand, M. (1996) New concentration inequalities in product spaces. Invent. Math., 126, 505-563.
  • [38] Uspensky, J.V. (1937) Introduction to Mathematical Probability. New York: McGraw-Hill.
  • [39] Van de Geer, S. (1990) Estimating a regression function. Ann. Statist., 18, 907-924.
  • [40] Van de Geer, S. (1995) The method of sieves and minimum contrast estimators. Math. Methods Statist., 4, 20-38.
  • [41] Wong, W.H. and Shen, X. (1992) Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Technical report, University of Chicago.
  • [42] Wong, W.H. and Shen, X. (1995) Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist., 23, 339-362.