Bernoulli

  • Bernoulli
  • Volume 10, Number 6 (2004), 1039-1051.

Model selection for Gaussian regression with random design

Lucien Birgé

Full-text: Open access

Abstract

This paper is concerned with Gaussian regression with random design, where the observations are independent and identically distributed. It is known from work by Le Cam that the rate of convergence of optimal estimators is closely connected to the metric structure of the parameter space with respect to the Hellinger distance. In particular, this metric structure essentially determines the risk when the loss function is a power of the Hellinger distance. For random design regression, one typically uses as loss function the squared L2-distance between the estimator and the parameter. If the parameter space is bounded with respect to the L-norm, both distances are equivalent. Without this assumption, it may happen that there is a large distortion between the two distances, resulting in some unusual rates of convergence for the squared L2-risk, as noticed by Baraud. We explain this phenomenon and then show that the use of the Hellinger distance instead of the L2-distance allows us to recover the usual rates and to carry out model selection in great generality. An extension to the L2-risk is given under a boundedness assumption similar to that given by Wegkamp and by Yang.

Article information

Source
Bernoulli, Volume 10, Number 6 (2004), 1039-1051.

Dates
First available in Project Euclid: 21 January 2005

Permanent link to this document
https://projecteuclid.org/euclid.bj/1106314849

Digital Object Identifier
doi:10.3150/bj/1106314849

Mathematical Reviews number (MathSciNet)
MR2108042

Zentralblatt MATH identifier
1064.62030

Keywords
Besov spaces Hellinger distance minimax risk model selection random design regression

Citation

Birgé, Lucien. Model selection for Gaussian regression with random design. Bernoulli 10 (2004), no. 6, 1039--1051. doi:10.3150/bj/1106314849. https://projecteuclid.org/euclid.bj/1106314849


Export citation

References

  • [1] Baraud, Y. (2000) Model selection for regression on a fixed design. Probab. Theory Related Fields, 117, 467-493. Abstract can also be found in the ISI/STMA publication
  • [2] Baraud, Y. (2002) Model selection for regression on a random design. ESAIM Probab. Statist., 6, 127-146.
  • [3] Barron, A.R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields, 113, 301-415.
  • [4] Birgé, L. (1983) Approximation dans les espaces métriques et théorie de l´estimation. Z. Wahrscheinlichkeitstheorie Verw. Geb., 65, 181-237.
  • [5] Birgé, L. (2003) Model selection via testing: an alternative to (penalized) maximum likelihood estimators. Preprint no. 862, Laboratoire de Probabilités et Modèles Aléatoires, Université Paris VI. http://www.proba.jussieu.fr/mathdoc/preprints/index.html.
  • [6] Birgé, L. and Massart, P. (1997) From model selection to adaptive estimation. In D. Pollard, E. Torgessen and G. Yang (eds), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, pp. 55-87. New York: Springer-Verlag.
  • [7] Birgé, L. and Massart, P. (2000) An adaptive compression algorithm in Besov spaces. Constr. Approx., 16, 1-36.
  • [8] Birgé, L. and Massart, P. (2001) Gaussian model selection. J. Eur. Math. Soc., 3, 203-268.
  • [9] Brown, L.D., Cai, T.T., Low, M.G. and Zhang, C.-H. (2002) Asymptotic equivalence theory for nonparametric regression with random design. Ann. Statist., 30, 688-707. Abstract can also be found in the ISI/STMA publication
  • [10] DeVore, R.A. and Lorentz, G.G. (1993) Constructive Approximation. Berlin: Springer-Verlag.
  • [11] Donoho, D.L. and Liu, R.C. (1991) Geometrizing rates of convergence II. Ann. Statist., 19, 633-667. Abstract can also be found in the ISI/STMA publication
  • [12] Donoho, D.L. and Johnstone, I.M. (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425-455. Abstract can also be found in the ISI/STMA publication
  • [13] Juditsky, A. and Nemirovski, A.S. (2000) Functional aggregation for nonparametric regression. Ann. Statist., 28, 681-712. Abstract can also be found in the ISI/STMA publication
  • [14] Kerkyacharian, G. and Picard, D. (2000) Thresholding algorithms, maxisets and well-concentrated bases. Test, 9, 283-344. Abstract can also be found in the ISI/STMA publication
  • [15] Le Cam, L.M. (1973) Convergence of estimates under dimensionality restrictions. Ann. Statist., 1, 38-53.
  • [16] Le Cam, L.M. (1975) On local and global properties in the theory of asymptotic normality of experiments. In M. Puri (ed.), Stochastic Processes and Related Topics, Vol. 1, pp. 13-54. New York: Academic Press.
  • [17] Le Cam, L.M. (1986) Asymptotic Methods in Statistical Decision Theory. New York: Springer-Verlag.
  • [18] Wegkamp, M. (2003) Model selection in nonparametric regression. Ann. Statist., 31, 252-273. Abstract can also be found in the ISI/STMA publication
  • [19] Yang, Y. (2000) Combining different procedures for adaptive regression. J. Multivariate Anal., 74, 135-161. Abstract can also be found in the ISI/STMA publication
  • [20] Yang, Y. (2001) Adaptive regression by mixing. J. Amer. Statist. Assoc., 96, 574-588. Abstract can also be found in the ISI/STMA publication
  • [21] Yang, Y. (2004) Aggregating regression procedures for a better performance. Bernoulli, 10, 25-47.
  • [22] Yang, Y. and Barron, A.R. (1998) An asymptotic property of model selection criteria. IEEE Trans. Inform. Theory, 44, 95-116.