## Bernoulli

• Bernoulli
• Volume 25, Number 4A (2019), 3016-3040.

### Localized Gaussian width of $M$-convex hulls with applications to Lasso and convex aggregation

Pierre C. Bellec

#### Abstract

Upper and lower bounds are derived for the Gaussian mean width of a convex hull of $M$ points intersected with a Euclidean ball of a given radius. The upper bound holds for any collection of extreme points bounded in Euclidean norm. The upper bound and the lower bound match up to a multiplicative constant whenever the extreme points satisfy a one sided Restricted Isometry Property.

An appealing aspect of the upper bound is that no assumption on the covariance structure of the extreme points is needed. This aspect is especially useful to study regression problems with anisotropic design distributions. We provide applications of this bound to the Lasso estimator in fixed-design regression, the Empirical Risk Minimizer in the anisotropic persistence problem, and the convex aggregation problem in density estimation.

#### Article information

Source
Bernoulli, Volume 25, Number 4A (2019), 3016-3040.

Dates
Received: November 2017
Revised: June 2018
First available in Project Euclid: 13 September 2019

Permanent link to this document
https://projecteuclid.org/euclid.bj/1568362050

Digital Object Identifier
doi:10.3150/18-BEJ1078

Mathematical Reviews number (MathSciNet)
MR4003572

Zentralblatt MATH identifier
07110119

#### Citation

Bellec, Pierre C. Localized Gaussian width of $M$-convex hulls with applications to Lasso and convex aggregation. Bernoulli 25 (2019), no. 4A, 3016--3040. doi:10.3150/18-BEJ1078. https://projecteuclid.org/euclid.bj/1568362050

#### References

• [1] Audibert, J.-Y. and Tsybakov, A.B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608–633.
• [2] Bartlett, P.L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497–1537.
• [3] Bartlett, P.L. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311–334.
• [4] Bartlett, P.L., Mendelson, S. and Neeman, J. (2012). $\ell_{1}$-Regularized linear regression: Persistence and oracle inequalities. Probab. Theory Related Fields 154 193–224.
• [5] Bellec, P. and Tsybakov, A. (2017). Bounds on the prediction error of penalized least squares estimators with convex penalty. In Modern Problems of Stochastic Analysis and Statistics. Springer Proc. Math. Stat. 208 (V. Panov, ed.) 315–333. Cham: Springer.
• [6] Bellec, P.C. (2017). Optimal exponential bounds for aggregation of density estimators. Bernoulli 23 219–248.
• [7] Bellec, P.C. (2017). Optimistic lower bounds for convex regularized least-squares. ArXiv preprint. Available at arXiv:1703.01332.
• [8] Bellec, P.C. (2018). Optimal bounds for aggregation of affine estimators. Ann. Statist. 46 30–59.
• [9] Bellec, P.C. (2018). Sharp oracle inequalities for least squares estimators in shape restricted regression. Ann. Statist. 46 745–780.
• [10] Bellec, P.C. (2018). The noise barrier and the large signal bias of the lasso and other convex estimators. ArXiv preprint. Available at arXiv:1804.01230.
• [11] Bellec, P.C., Dalalyan, A.S., Grappin, E. and Paris, Q. (2018). On the prediction loss of the lasso in the partially labeled setting. Electron. J. Stat. 12 3443–3472.
• [12] Bellec, P.C., Lecué, G. and Tsybakov, A.B. (2018). Slope meets Lasso: Improved oracle bounds and optimality. Ann. Statist. 46 3603–3642.
• [13] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [14] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford: Oxford Univ. Press. With a foreword by Michel Ledoux.
• [15] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
• [16] Candès, E.J. and Plan, Y. (2009). Near-ideal model selection by $\ell_{1}$ minimization. Ann. Statist. 37 2145–2177.
• [17] Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist. 42 2340–2381.
• [18] Dai, D., Rigollet, P. and Zhang, T. (2012). Deviation optimal learning using greedy $Q$-aggregation. Ann. Statist. 40 1878–1905.
• [19] Dalalyan, A.S., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23 552–581.
• [20] Dalalyan, A.S. and Tsybakov, A.B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Berlin: Springer.
• [21] den Hollander, F. (2012). Probability theory: The coupling method.
• [22] Giraud, C. (2014). Introduction to High-Dimensional Statistics. Chapman amd Hall/CRC Monographs on Statistics and Applied Probability 138. Boca Raton, FL: CRC Press.
• [23] Gordon, Y., Litvak, A.E., Mendelson, S. and Pajor, A. (2007). Gaussian averages of interpolated bodies and applications to approximate reconstruction. J. Approx. Theory 149 59–73.
• [24] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• [25] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability 143. Boca Raton, FL: CRC Press.
• [26] Juditsky, A., Rigollet, P. and Tsybakov, A.B. (2008). Learning by mirror averaging. Ann. Statist. 36 2183–2206.
• [27] Knight, K. and Fu, W. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
• [28] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
• [29] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Heidelberg: Springer. Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School].
• [30] Koltchinskii, V., Lounici, K. and Tsybakov, A.B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• [31] Lecué, G. (2006). Lower bounds and aggregation in density estimation. J. Mach. Learn. Res. 7 971–981.
• [32] Lecué, G. (2013). Empirical risk minimization is optimal for the convex aggregation problem. Bernoulli 19 2153–2166.
• [33] Lecué, G. and Mendelson, S. (2009). Aggregation via empirical risk minimization. Probab. Theory Related Fields 145 591–613.
• [34] Lecué, G. and Mendelson, S. (2013). Learning subGaussian classes: Upper and minimax bounds. ArXiv preprint. Available at arXiv:1305.4825.
• [35] Lecué, G. and Rigollet, P. (2014). Optimal learning with $Q$-aggregation. Ann. Statist. 42 211–224.
• [36] Leung, G. and Barron, A.R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
• [37] Lounici, K. (2007). Generalized mirror averaging and $D$-convex aggregation. Math. Methods Statist. 16 246–259.
• [38] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A.B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
• [39] Mendelson, S. (2015). Learning without concentration. J. ACM 62 Article ID 21.
• [40] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Berlin: Springer.
• [41] Pisier, G. (1981). Remarques sur un résultat non publié de B. Maurey. In Seminar on Functional Analysis, 19801981, Exp. No. 5 1–12. Palaiseau: École Polytech.
• [42] Plan, Y. and Vershynin, R. (2016). The generalized Lasso with non-linear observations. IEEE Trans. Inform. Theory 62 1528–1537.
• [43] Plan, Y., Vershynin, R. and Yudovina, E. (2017). High-dimensional estimation with geometric constraints. Inf. Inference 6 1–40.
• [44] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
• [45] Rigollet, P. and Tsybakov, A.B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
• [46] Rigollet, P. and Tsybakov, A.B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260–280.
• [47] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447.
• [48] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
• [49] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [50] Tsybakov, A.B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines 303–313. Springer.
• [51] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics. New York: Springer. Revised and extended from the 2004 French original. Translated by Vladimir Zaiats.
• [52] Tsybakov, A.B. (2014). Aggregation and minimax optimality in high-dimensional estimation. In Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. IV 225–246. Seoul: Kyung Moon Sa.
• [53] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge: Cambridge Univ. Press.
• [54] Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75–87.
• [55] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.