## The Annals of Statistics

### “Local” vs. “global” parameters—breaking the Gaussian complexity barrier

Shahar Mendelson

#### Abstract

We show that if $F$ is a convex class of functions that is $L$-sub-Gaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by “local” covering estimates of the class (i.e., the covering number at a specific level), rather than by the Gaussian average, which takes into account the structure of $F$ at an arbitrarily small scale. To that end, we establish new sharp upper and lower estimates on the error rate in such learning problems.

#### Article information

Source
Ann. Statist., Volume 45, Number 5 (2017), 1835-1862.

Dates
Revised: August 2016
First available in Project Euclid: 31 October 2017

https://projecteuclid.org/euclid.aos/1509436820

Digital Object Identifier
doi:10.1214/16-AOS1510

Mathematical Reviews number (MathSciNet)
MR3718154

Zentralblatt MATH identifier
06821111

#### Citation

Mendelson, Shahar. “Local” vs. “global” parameters—breaking the Gaussian complexity barrier. Ann. Statist. 45 (2017), no. 5, 1835--1862. doi:10.1214/16-AOS1510. https://projecteuclid.org/euclid.aos/1509436820

#### References

• [1] Anthony, M. and Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge Univ. Press, Cambridge.
• [2] Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113–150.
• [3] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• [4] Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge Univ. Press, Cambridge.
• [5] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Springer, Heidelberg.
• [6] Lecué, G. and Mendelson, S. (2013). Learning subgaussian classes: Upper and minimax bounds. Technical report, CNRS, Ecole polytechnique and Technion.
• [7] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin.
• [8] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
• [9] Mendelson, S. (2008). Obtaining fast error rates in nonconvex situations. J. Complexity 24 380–397.
• [10] Mendelson, S. (2014). Learning without concentration for general loss functions. Preprint. Available at arXiv:1410.3192.
• [11] Mendelson, S. (2015). Learning without concentration. J. ACM 62 Art. 21, 25.
• [12] Mendelson, S. (2016). Upper bounds on product and multiplier empirical processes. Stochastic Process. Appl. 126 3652–3680.
• [13] Mendelson, S. (2017). Supplement to “Local’ vs. ‘global’ parameters—breaking the Gaussian complexity barrier.” DOI:10.1214/16-AOS1510SUPP.
• [14] Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 1248–1282.
• [15] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
• [16] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
• [17] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599.

#### Supplemental materials

• Supplement to "Local’ vs. ‘global’ parameters—breaking the Gaussian complexity barrier”. We prove two observations: the first shows that the setup of the Young–Barron theorem is different from the one we study here, and the other is that for $p>1$ there is a true gap between the “local” and “global” complexities of $B_{p}^{n}$.