The Annals of Statistics

Parametric estimation. Finite sample theory

Vladimir Spokoiny

Full-text: Open access


The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are as follows: (1) the study is nonasymptotic, that is, the sample size is fixed and does not tend to infinity; (2) the parametric assumption is possibly misspecified and the underlying data distribution can lie beyond the given parametric family. These two features enable to bridge the gap between parametric and nonparametric theory and to build a unified framework for statistical estimation. The main results include large deviation bounds for the (quasi) maximum likelihood and the local quadratic bracketing of the log-likelihood process. The latter yields a number of important corollaries for statistical inference: concentration, confidence and risk bounds, expansion of the maximum likelihood estimate, etc. All these corollaries are stated in a nonclassical way admitting a model misspecification and finite samples. However, the classical asymptotic results including the efficiency bounds can be easily derived as corollaries of the obtained nonasymptotic statements. At the same time, the new bracketing device works well in the situations with large or growing parameter dimension in which the classical parametric theory fails. The general results are illustrated for the i.i.d. setup as well as for generalized linear and median estimation. The results apply for any dimension of the parameter space and provide a quantitative lower bound on the sample size yielding the root-n accuracy.

Article information

Ann. Statist., Volume 40, Number 6 (2012), 2877-2909.

First available in Project Euclid: 8 February 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F10: Point estimation
Secondary: 62J12: Generalized linear models 62F25: Tolerance and confidence regions 62H12: Estimation

Maximum likelihood local quadratic bracketing deficiency concentration


Spokoiny, Vladimir. Parametric estimation. Finite sample theory. Ann. Statist. 40 (2012), no. 6, 2877--2909. doi:10.1214/12-AOS1054.

Export citation


  • Andresen, A. and Spokoiny, V. (2012). Wilks theorem for a quasi profile maximum likelihood. Unpublished manuscript.
  • Bednorz, W. (2006). A theorem on majorizing measures. Ann. Probab. 34 1771–1781.
  • Birgé, L. (2006). Model selection via testing: An alternative to (penalized) maximum likelihood estimators. Ann. Inst. Henri Poincaré Probab. Stat. 42 273–325.
  • Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113–150.
  • Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329–375.
  • Boucheron, S., Lugosi, G. and Massart, P. (2003). Concentration inequalities using the entropy method. Ann. Probab. 31 1583–1614.
  • Ibragimov, I. A. and Khas’minskiĭ, R. Z. (1981). Statistical Estimation: Asymptotic Theory. Applications of Mathematics 16. Springer-Verlag, New York-Berlin. Translated from the Russian by Samuel Kotz.
  • Le Cam, L. (1960). Locally asymptotically normal families of distributions. Certain approximations to families of distributions and their use in the theory of estimation and testing hypotheses. Univ. California Publ. Statist. 3 37–98.
  • Le Cam, L. and Yang, G. L. (2000). Asymptotics in Statistics: Some Basic Concepts, 2nd ed. Springer, New York.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall, London.
  • Spokoiny, V. (2012a). Roughness penalty, Wilks phenomenon, and Bernstein–von Mises theorem. Unpublished manuscript. Available at arXiv:1205.0498 [stat.ME].
  • Spokoiny, V. (2012b). Supplement to “Parametric estimation. Finite sample theory.” DOI:10.1214/12-AOS1054SUPP.
  • Spokoiny, V., Wang, W. and Härdle, W. (2012). Local quantile regression. Unpublished manuscript. Available at arXiv:1208.5384 [math.ST].
  • Talagrand, M. (1996). Majorizing measures: The generic chaining. Ann. Probab. 24 1049–1103.
  • Talagrand, M. (2001). Majorizing measures without measures. Ann. Probab. 29 411–417.
  • Talagrand, M. (2005). The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer, Berlin.
  • van de Geer, S. (1993). Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 21 14–44.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.

Supplemental materials

  • Supplementary material: Some results from the theory of empirical processes. This part collects some general deviation bounds for non-Gaussian quadratic forms and for general centered random processes used in the text.