The Annals of Statistics

“Local” vs. “global” parameters—breaking the Gaussian complexity barrier

Shahar Mendelson

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We show that if $F$ is a convex class of functions that is $L$-sub-Gaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by “local” covering estimates of the class (i.e., the covering number at a specific level), rather than by the Gaussian average, which takes into account the structure of $F$ at an arbitrarily small scale. To that end, we establish new sharp upper and lower estimates on the error rate in such learning problems.

Article information

Source
Ann. Statist., Volume 45, Number 5 (2017), 1835-1862.

Dates
Received: October 2015
Revised: August 2016
First available in Project Euclid: 31 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1509436820

Digital Object Identifier
doi:10.1214/16-AOS1510

Mathematical Reviews number (MathSciNet)
MR3718154

Zentralblatt MATH identifier
06821111

Subjects
Primary: 62G08: Nonparametric regression 62C20: Minimax procedures 60G15: Gaussian processes

Keywords
Error rates Gaussian averages covering numbers

Citation

Mendelson, Shahar. “Local” vs. “global” parameters—breaking the Gaussian complexity barrier. Ann. Statist. 45 (2017), no. 5, 1835--1862. doi:10.1214/16-AOS1510. https://projecteuclid.org/euclid.aos/1509436820


Export citation

References

  • [1] Anthony, M. and Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge Univ. Press, Cambridge.
  • [2] Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113–150.
  • [3] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • [4] Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge Univ. Press, Cambridge.
  • [5] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Springer, Heidelberg.
  • [6] Lecué, G. and Mendelson, S. (2013). Learning subgaussian classes: Upper and minimax bounds. Technical report, CNRS, Ecole polytechnique and Technion.
  • [7] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin.
  • [8] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
  • [9] Mendelson, S. (2008). Obtaining fast error rates in nonconvex situations. J. Complexity 24 380–397.
  • [10] Mendelson, S. (2014). Learning without concentration for general loss functions. Preprint. Available at arXiv:1410.3192.
  • [11] Mendelson, S. (2015). Learning without concentration. J. ACM 62 Art. 21, 25.
  • [12] Mendelson, S. (2016). Upper bounds on product and multiplier empirical processes. Stochastic Process. Appl. 126 3652–3680.
  • [13] Mendelson, S. (2017). Supplement to “`Local’ vs. ‘global’ parameters—breaking the Gaussian complexity barrier.” DOI:10.1214/16-AOS1510SUPP.
  • [14] Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 1248–1282.
  • [15] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • [16] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • [17] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599.

Supplemental materials

  • Supplement to "`Local’ vs. ‘global’ parameters—breaking the Gaussian complexity barrier”. We prove two observations: the first shows that the setup of the Young–Barron theorem is different from the one we study here, and the other is that for $p>1$ there is a true gap between the “local” and “global” complexities of $B_{p}^{n}$.