Electronic Journal of Statistics

Optimal model selection in heteroscedastic regression using piecewise polynomial functions

Adrien Saumard

Full-text: Open access


We consider the estimation of a regression function with random design and heteroscedastic noise in a nonparametric setting. More precisely, we address the problem of characterizing the optimal penalty when the regression function is estimated by using a penalized least-squares model selection method. In this context, we show the existence of a minimal penalty, defined to be the maximum level of penalization under which the model selection procedure totally misbehaves. The optimal penalty is shown to be twice the minimal one and to satisfy a non-asymptotic pathwise oracle inequality with leading constant almost one. Finally, the ideal penalty being unknown in general, we propose a hold-out penalization procedure and show that the latter is asymptotically optimal.

Article information

Electron. J. Statist., Volume 7 (2013), 1184-1223.

First available in Project Euclid: 25 April 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression 62G09: Resampling methods 62J02: General nonlinear regression

Nonparametric regression hold-out penalty heteroscedastic noise random design optimal model selection slope heuristics


Saumard, Adrien. Optimal model selection in heteroscedastic regression using piecewise polynomial functions. Electron. J. Statist. 7 (2013), 1184--1223. doi:10.1214/13-EJS803. https://projecteuclid.org/euclid.ejs/1366896903

Export citation


  • [1] R. Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains., Electron. J. Probab., 13 :1000–1034, 2008.
  • [2] H. Akaike. Statistical predictor identification., Ann. Inst. Statist. Math., 22:203–217, 1970.
  • [3] H. Akaike. Information theory and an extension of the maximum likelihood principle. In, Second International Symposium on Information Theory (Tsahkadsor, 1971), pages 267–281. Akadémiai Kiadó, Budapest, 1973.
  • [4] S. Arlot., Resampling and Model Selection. PhD thesis, University Paris-Sud 11, December 2007. oai:tel.archives-ouvertes.fr:tel-00198803_v1.
  • [5] S. Arlot. $V$-fold cross-validation improved: $V$-fold penalization, February 2008., arXiv:0802.0566v2.
  • [6] S. Arlot. Model selection by resampling penalization., Electron. J. Stat., 3:557–624, 2009.
  • [7] S. Arlot. Choosing a penalty for model selection in heteroscedastic regression, June 2010., arXiv:0812.3141.
  • [8] S. Arlot and F. Bach. Data-driven calibration of linear estimators with minimal penalties. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 46–54, 2009.
  • [9] S. Arlot and P. Massart. Data-driven calibration of penalties for least-squares regression., J. Mach. Learn. Res., 10:245–279 (electronic), 2009.
  • [10] Y. Baraud, C. Giraud, and S. Huet. Gaussian model selection with an unknown variance., Ann. Statist., 37(2):630–672, 2009.
  • [11] A. Barron, L. Birgé, and P. Massart. Risk bounds for model selection via penalization., Probab. Theory Related Fields, 113(3):301–413, 1999.
  • [12] P.L. Bartlett, S. Boucheron, and G. Lugosi. Model selection and error estimation., Machine Learning, 48:85–113, 2002.
  • [13] P.L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities., Ann. Statist., 33(4) :1497–1537, 2005.
  • [14] J.-P. Baudry, C. Maugis, and B. Michel. Slope heuristics: overview and implementation., Stat. Comput., 22(2):455–470, 2012.
  • [15] L. Birgé and P. Massart. Rates of convergence for minimum contrast estimators., Probab. Theory Related Fields, 97:113–150, 1993.
  • [16] L. Birgé and P. Massart. From model selection to adaptive estimation. In, Festschrift for Lucien Le Cam, pages 55–87. Springer, New York, 1997.
  • [17] L. Birgé and P. Massart. Minimum contrast estimators on sieves: exponential bounds and rates of convergence., Bernoulli, 4(3):329–375, 1998.
  • [18] L. Birgé and P. Massart. Gaussian model selection., J.Eur.Math.Soc., 3(3):203–268, 2001.
  • [19] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection., Probab. Theory Related Fields, 138(1-2):33–73, 2007.
  • [20] S. Boucheron and P. Massart. A high-dimensional Wilks phenomenon., Probab. Theory Related Fields, 150(3-4):405–433, 2011.
  • [21] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp. Aggregation for Gaussian regression., Ann. Statist., 35(4) :1674–1697, 2007.
  • [22] G. Castellan. Modified Akaike’s criterion for histogram density estimation., Technical report $\sharp$99.61, Université Paris-Sud, 1999.
  • [23] O. Catoni., Statistical learning theory and stochastic optimization, volume 1851 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2004. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001.
  • [24] A. S. Dalalyan and J. Salmon. Sharp oracle inequalities for aggregation of affine estimators., Ann. Statist., 40(4) :2327–2355, 2012.
  • [25] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting and sharp oracle inequalities. In, Learning theory, volume 4539 of Lecture Notes in Comput. Sci., pages 97–111. Springer, Berlin, 2007.
  • [26] B. Efron. Estimating the error rate of a prediction rule: improvement on cross-validation., J. Amer. Statist. Assoc., 78(382):316–331, 1983.
  • [27] V. Koltchinskii. Rademacher penalties and structural risk minimization., IEEE Trans. Inform. Theory, 47(5) :1902–1914, 2001.
  • [28] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimisation., Ann. Statist., 34(6) :2593–2656, 2006.
  • [29] G. Lecué and S. Mendelson. Aggregation via empirical risk minimization., Probab. Theory Related Fields, 145(3-4):591–613, 2009.
  • [30] M. Lerasle. Optimal model selection for density estimation of stationary data under various mixing conditions., Ann. Statist., 39(4) :1852–1877, 2011.
  • [31] M. Lerasle. Optimal model selection in density estimation., Ann. Inst. Henri Poincaré Probab. Stat., 48(3):884–908, 2012.
  • [32] G. Leung and A. R. Barron. Information theory and mixing least-squares regressions., IEEE Trans. Inform. Theory, 52(8) :3396–3410, 2006.
  • [33] Colin L. Mallows. Some comments on $\mathrmC_p$., Technometrics, 15:661–675, 1973.
  • [34] P. Massart., Concentration inequalities and model selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard.
  • [35] P. Rigollet and A.B. Tsybakov. Sparse estimation by exponential weighting., Statistical Science, 27(4):558–575, 2012.
  • [36] A. Saumard. Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression., Electron. J. Statist., 6(1-2):579–655, 2012.
  • [37] A. B. Tsybakov., Introduction à l’estimation non-paramétrique. Springer-Verlag, Berlin, 1996.