## Bernoulli

• Bernoulli
• Volume 25, Number 2 (2019), 1225-1255.

### Oracle inequalities for high-dimensional prediction

#### Abstract

The abundance of high-dimensional data in the modern sciences has generated tremendous interest in penalized estimators such as the lasso, scaled lasso, square-root lasso, elastic net, and many others. In this paper, we establish a general oracle inequality for prediction in high-dimensional linear regression with such methods. Since the proof relies only on convexity and continuity arguments, the result holds irrespective of the design matrix and applies to a wide range of penalized estimators. Overall, the bound demonstrates that generic estimators can provide consistent prediction with any design matrix. From a practical point of view, the bound can help to identify the potential of specific estimators, and they can help to get a sense of the prediction accuracy in a given application.

#### Article information

Source
Bernoulli, Volume 25, Number 2 (2019), 1225-1255.

Dates
Revised: December 2017
First available in Project Euclid: 6 March 2019

Permanent link to this document
https://projecteuclid.org/euclid.bj/1551862849

Digital Object Identifier
doi:10.3150/18-BEJ1019

Mathematical Reviews number (MathSciNet)
MR3920371

Zentralblatt MATH identifier
07049405

#### Citation

Lederer, Johannes; Yu, Lu; Gaynanova, Irina. Oracle inequalities for high-dimensional prediction. Bernoulli 25 (2019), no. 2, 1225--1255. doi:10.3150/18-BEJ1019. https://projecteuclid.org/euclid.bj/1551862849

#### References

• [1] Bellec, P., Dalalyan, A., Grappin, E. and Paris, Q. (2016). On the prediction loss of the lasso in the partially labeled setting. Available at arXiv:1606.06179.
• [2] Bellec, P., Lecué, G. and Tsybakov, A. (2016). Slope meets Lasso: Improved oracle bounds and optimality. Available at arXiv:1605.08651.
• [3] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
• [4] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [5] Bien, J., Taylor, J. and Tibshirani, R. (2013). A lasso for hierarchical interactions. Ann. Statist. 41 1111–1141.
• [6] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E.J. (2015). SLOPE – Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140.
• [7] Bondell, H. and Reich, B. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64 115–123.
• [8] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
• [9] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Berlin: Springer.
• [10] Bunea, F., Lederer, J. and She, Y. (2014). The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms. IEEE Trans. Inform. Theory 60 1313–1325.
• [11] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• [12] Chatterjee, S. (2013). Assumptionless consistency of the lasso. Available at arXiv:1303.5817.
• [13] Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist. 42 2340–2381.
• [14] Chételat, D., Lederer, J. and Salmon, J. (2017). Optimal two-step prediction in regression. Electron. J. Stat. 11 2519–2546.
• [15] Chichignoud, M., Lederer, J. and Wainwright, M. (2016). A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J. Mach. Learn. Res. 17 1–20.
• [16] Dalalyan, A., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23 552–581.
• [17] Giraud, C., Huet, S. and Verzelen, N. (2012). High-dimensional regression with unknown variance. Statist. Sci. 27 500–518.
• [18] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• [19] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton, FL: Chapman & Hall.
• [20] Hebiri, M. and Lederer, J. (2013). How Correlations Influence Lasso Prediction. IEEE Trans. Inform. Theory 59 1846–1854.
• [21] Hebiri, M. and van de Geer, S. (2011). The smooth-lasso and other $\ell_{1}+\ell_{2}$-penalized methods. Electron. J. Stat. 5 1184–1226.
• [22] Jacob, L., Obozinski, G. and Vert, J.-P. (2009). Group lasso with overlap and graph lasso. In ICML 2009 433–440.
• [23] Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). $l_{1}$ trend filtering. SIAM Rev. 51 339–360.
• [24] Koltchinskii, V. (2009). Sparse recovery in convex hulls via entropy penalization. Ann. Statist. 37 1332–1359.
• [25] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Berlin: Springer.
• [26] Koltchinskii, V., Lounici, K. and Tsybakov, A. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• [27] Lederer, J. and Müller, C. (2015). Don’t fall for tuning parameters: Tuning-free variable selection in high dimensions with the TREX. In AAAI-15 2729–2735.
• [28] Lederer, J. and van de Geer, S. (2014). New concentration inequalities for suprema of empirical processes. Bernoulli 20 2020–2038.
• [29] Massart, P. and Meynet, C. (2011). The Lasso as an $\ell_{1}$-ball model selection procedure. Electron. J. Stat. 5 669–687.
• [30] Rigollet, P. and Tsybakov, A. (2011). Exponential Screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
• [31] Rudin, L., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259–268.
• [32] Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group lasso. J. Comput. Graph. Statist. 22 231–245.
• [33] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068.
• [34] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
• [35] Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. J. Mach. Learn. Res. 14 3385–3418.
• [36] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
• [37] Tibshirani, R. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285–323.
• [38] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
• [39] Tsybakov, A. (2009). Introduction to Nonparametric Estimation. New York: Springer.
• [40] van de Geer, S. and Muro, A. (2014). On higher order isotropy conditions and lower bounds for sparse quadratic forms. Electron. J. Stat. 8 3031–3061.
• [41] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes. New York: Springer.
• [42] van de Geer, S. (2007). The deterministic Lasso. In 2007 Proc. Amer. Math. Soc. [CD-ROM]. Available at www.stat.math.ethz.ch/~geer/lasso.pdf.
• [43] van de Geer, S. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• [44] van de Geer, S. and Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. Inst. Math. Stat. Collect. 9 303–316.
• [45] Wainwright, M. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [46] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
• [47] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.