Annales de l'Institut Henri Poincaré, Probabilités et Statistiques

Penalized maximum likelihood estimation and effective dimension

Vladimir Spokoiny

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


This paper extends some prominent statistical results including Fisher Theorem and Wilks phenomenon to the penalized maximum likelihood estimation with a quadratic penalization. It appears that sharp expansions for the penalized MLE $\widetilde{\boldsymbol{\theta}}_{G}$ and for the penalized maximum likelihood can be obtained without involving any asymptotic arguments, the results only rely on smoothness and regularity properties of the of the considered log-likelihood function. The error of estimation is specified in terms of the effective dimension $\mathtt{p}_{G}$ of the parameter set which can be much smaller than the true parameter dimension and even allows an infinite dimensional functional parameter. In the i.i.d. case, the Fisher expansion for the penalized MLE can be established under the constraint “$\mathtt{p}_{G}^{2}/n$ is small” while the remainder in the Wilks result is of order $\sqrt{\mathtt{p}_{G}^{3}/n}$.


Cet article généralise certains résultats statistiques importants dont le Théorème de Fisher et le phénomène de Wilks à l’estimation du maximum de vraisemblance pénalisée de façon quadratique. Il apparaît que des développements précis pour l’EMV pénalisée $\widetilde{\boldsymbol{\theta}}_{G}$ et le maximum de vraisemblance pénalisé peuvent être obtenus sans arguments asymptotiques, les résultats reposent alors seulement sur la régularité et les propriétés de la fonction de log-vraisemblance. L’erreur d’estimation est spécifiée en fonction de la dimension effective $\mathtt{p}_{G}$ de l’ensemble des paramètres qui peut être beaucoup plus petite que la véritable dimension et permet ainsi de considérer un cas infini dimensionnel. Dans le cas i.i.d., le développement de Fisher pour l’EMV pénalisée peut être établi sous la contrainte « $\mathtt{p}_{G}^{2}/n$ est petit » tandis que le reste dans le résultat de Wilks est d’ordre $\sqrt{\mathtt{p}_{G}^{3}/n}$.

Article information

Ann. Inst. H. Poincaré Probab. Statist., Volume 53, Number 1 (2017), 389-429.

Received: 29 July 2014
Revised: 24 August 2015
Accepted: 5 October 2015
First available in Project Euclid: 8 February 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F10: Point estimation
Secondary: 62J12: Generalized linear models 62F25: Tolerance and confidence regions 62H12: Estimation

Penalty Wilks and Fisher expansions


Spokoiny, Vladimir. Penalized maximum likelihood estimation and effective dimension. Ann. Inst. H. Poincaré Probab. Statist. 53 (2017), no. 1, 389--429. doi:10.1214/15-AIHP720.

Export citation


  • [1] A. Andresen and V. Spokoiny. Critical dimension in profile semiparametric estimation. Electron. J. Stat. 8 (2) (2014) 3077–3125.
  • [2] A. Barron, L. Birgé and P. Massart. Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 (3) (1999) 301–413.
  • [3] A. Belloni and V. Chernozhukov. On the computational complexity of MCMC-based estimators in large samples. Ann. Statist. 37 (4) (2009) 2011–2055.
  • [4] L. Birgé and P. Massart. Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 (3) (1998) 329–375.
  • [5] L. Birgé and P. Massart. Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 (3) (2001) 203–268.
  • [6] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 (1–2) (2007) 33–73.
  • [7] S. Boucheron and P. Massart. A high-dimensional Wilks phenomenon. Probab. Theory Related Fields 150 (2011) 405–433.
  • [8] J. Fan, C. Zhang and J. Zhang. Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Statist. 29 (1) (2001) 153–193.
  • [9] S. Ghosal. Asymptotic normality of posterior distributions in high-dimensional linear models. Bernoulli 5 (2) (1999) 315–331.
  • [10] S. Ghosal. Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity. J. Multivariate Anal. 74 (1) (2000) 49–68.
  • [11] Y. Golubev and V. Spokoiny. Exponential bounds for minimum contrast estimators. Electron. J. Stat. 3 (2009) 712–746.
  • [12] P. J. Green and B. W. Silverman. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman & Hall, London, 1994.
  • [13] P. J. Huber. The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. 5th Berkeley Symp. Math. Stat. Probab., Univ. Calif. 1965/66, 1 221–233. Univ. California Press, Berkeley, CA, 1967.
  • [14] I. A. Ibragimov and R. Z. Khas’minskij. Statistical Estimation. Asymptotic Theory. Springer, New York, 1981. Transl. from the Russian by Samuel Kotz.
  • [15] Y. Kim. The Bernstein–von Mises theorem for the proportional hazard model. Ann. Statist. 34 (4) (2006) 1678–1700.
  • [16] R. Koenker, P. Ng and S. Portnoy. Quantile smoothing splines. Biometrika 81 (4) (1994) 673–680.
  • [17] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 (5) (2000) 1302–1338.
  • [18] E. Mammen. Empirical process of residuals for high-dimensional linear models. Ann. Statist. 24 (1) (1996) 307–335.
  • [19] S. Portnoy. Asymptotic behavior of $M$-estimators of $p$ regression parameters when $p^{2}/n$ is large. I. Consistency. Ann. Statist. 12 (4) (1984) 1298–1309.
  • [20] S. Portnoy. Asymptotic behavior of $M$ estimators of $p$ regression parameters when $p^{2}/n$ is large. II. Normal approximation. Ann. Statist. 13 (4) (1985) 1403–1417.
  • [21] S. Portnoy. Asymptotic behavior of the empiric distribution of $M$-estimated residuals from a regression model with many parameters. Ann. Statist. 14 (1986) 1152–1170.
  • [22] S. Portnoy. Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist. 16 (1) (1988) 356–366.
  • [23] X. Shen. On methods of sieves and penalization. Ann. Statist. 25 (6) (1997) 2555–2591.
  • [24] X. Shen and W. H. Wong. Convergence rate of sieve estimates. Ann. Statist. 22 (2) (1994) 580–615.
  • [25] V. Spokoiny. Parametric estimation. Finite sample theory. Ann. Statist. 40 (6) (2012) 2877–2909.
  • [26] V. Spokoiny, W. Wang and W. Härdle. Local quantile regression (with rejoinder). J. Statist. Plann. Inference 143 (7) (2013) 1109–1129.
  • [27] V. Spokoiny and M. Zhilova. Bootstrap confidence sets under model misspecification. Ann. Statist. 43 (2015) 2653–2675.
  • [28] S. van de Geer. $M$-estimation using penalties or sieves. J. Statist. Plann. Inference 108 (1–2) (2002) 55–69.
  • [29] S. A. Van de Geer. Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.
  • [30] A. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. With Applications to Statistics. Springer Series in Statistics. Springer, New York, 1996.
  • [31] A. Zaitsev, E. Burnaev and V. Spokoiny. Properties of the posterior distribution of a regression model based on Gaussian random fields. Autom. Remote Control 74 (10) (2013) 1645–1655.