Bernoulli

  • Bernoulli
  • Volume 25, Number 2 (2019), 1225-1255.

Oracle inequalities for high-dimensional prediction

Johannes Lederer, Lu Yu, and Irina Gaynanova

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The abundance of high-dimensional data in the modern sciences has generated tremendous interest in penalized estimators such as the lasso, scaled lasso, square-root lasso, elastic net, and many others. In this paper, we establish a general oracle inequality for prediction in high-dimensional linear regression with such methods. Since the proof relies only on convexity and continuity arguments, the result holds irrespective of the design matrix and applies to a wide range of penalized estimators. Overall, the bound demonstrates that generic estimators can provide consistent prediction with any design matrix. From a practical point of view, the bound can help to identify the potential of specific estimators, and they can help to get a sense of the prediction accuracy in a given application.

Article information

Source
Bernoulli, Volume 25, Number 2 (2019), 1225-1255.

Dates
Received: April 2017
Revised: December 2017
First available in Project Euclid: 6 March 2019

Permanent link to this document
https://projecteuclid.org/euclid.bj/1551862849

Digital Object Identifier
doi:10.3150/18-BEJ1019

Mathematical Reviews number (MathSciNet)
MR3920371

Zentralblatt MATH identifier
07049405

Keywords
high-dimensional regression oracle inequalities prediction

Citation

Lederer, Johannes; Yu, Lu; Gaynanova, Irina. Oracle inequalities for high-dimensional prediction. Bernoulli 25 (2019), no. 2, 1225--1255. doi:10.3150/18-BEJ1019. https://projecteuclid.org/euclid.bj/1551862849


Export citation

References

  • [1] Bellec, P., Dalalyan, A., Grappin, E. and Paris, Q. (2016). On the prediction loss of the lasso in the partially labeled setting. Available at arXiv:1606.06179.
  • [2] Bellec, P., Lecué, G. and Tsybakov, A. (2016). Slope meets Lasso: Improved oracle bounds and optimality. Available at arXiv:1605.08651.
  • [3] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
  • [4] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [5] Bien, J., Taylor, J. and Tibshirani, R. (2013). A lasso for hierarchical interactions. Ann. Statist. 41 1111–1141.
  • [6] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E.J. (2015). SLOPE – Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140.
  • [7] Bondell, H. and Reich, B. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64 115–123.
  • [8] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
  • [9] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Berlin: Springer.
  • [10] Bunea, F., Lederer, J. and She, Y. (2014). The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms. IEEE Trans. Inform. Theory 60 1313–1325.
  • [11] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [12] Chatterjee, S. (2013). Assumptionless consistency of the lasso. Available at arXiv:1303.5817.
  • [13] Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist. 42 2340–2381.
  • [14] Chételat, D., Lederer, J. and Salmon, J. (2017). Optimal two-step prediction in regression. Electron. J. Stat. 11 2519–2546.
  • [15] Chichignoud, M., Lederer, J. and Wainwright, M. (2016). A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J. Mach. Learn. Res. 17 1–20.
  • [16] Dalalyan, A., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23 552–581.
  • [17] Giraud, C., Huet, S. and Verzelen, N. (2012). High-dimensional regression with unknown variance. Statist. Sci. 27 500–518.
  • [18] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • [19] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton, FL: Chapman & Hall.
  • [20] Hebiri, M. and Lederer, J. (2013). How Correlations Influence Lasso Prediction. IEEE Trans. Inform. Theory 59 1846–1854.
  • [21] Hebiri, M. and van de Geer, S. (2011). The smooth-lasso and other $\ell_{1}+\ell_{2}$-penalized methods. Electron. J. Stat. 5 1184–1226.
  • [22] Jacob, L., Obozinski, G. and Vert, J.-P. (2009). Group lasso with overlap and graph lasso. In ICML 2009 433–440.
  • [23] Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). $l_{1}$ trend filtering. SIAM Rev. 51 339–360.
  • [24] Koltchinskii, V. (2009). Sparse recovery in convex hulls via entropy penalization. Ann. Statist. 37 1332–1359.
  • [25] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Berlin: Springer.
  • [26] Koltchinskii, V., Lounici, K. and Tsybakov, A. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
  • [27] Lederer, J. and Müller, C. (2015). Don’t fall for tuning parameters: Tuning-free variable selection in high dimensions with the TREX. In AAAI-15 2729–2735.
  • [28] Lederer, J. and van de Geer, S. (2014). New concentration inequalities for suprema of empirical processes. Bernoulli 20 2020–2038.
  • [29] Massart, P. and Meynet, C. (2011). The Lasso as an $\ell_{1}$-ball model selection procedure. Electron. J. Stat. 5 669–687.
  • [30] Rigollet, P. and Tsybakov, A. (2011). Exponential Screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [31] Rudin, L., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259–268.
  • [32] Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group lasso. J. Comput. Graph. Statist. 22 231–245.
  • [33] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068.
  • [34] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • [35] Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. J. Mach. Learn. Res. 14 3385–3418.
  • [36] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [37] Tibshirani, R. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285–323.
  • [38] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
  • [39] Tsybakov, A. (2009). Introduction to Nonparametric Estimation. New York: Springer.
  • [40] van de Geer, S. and Muro, A. (2014). On higher order isotropy conditions and lower bounds for sparse quadratic forms. Electron. J. Stat. 8 3031–3061.
  • [41] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes. New York: Springer.
  • [42] van de Geer, S. (2007). The deterministic Lasso. In 2007 Proc. Amer. Math. Soc. [CD-ROM]. Available at www.stat.math.ethz.ch/~geer/lasso.pdf.
  • [43] van de Geer, S. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • [44] van de Geer, S. and Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. Inst. Math. Stat. Collect. 9 303–316.
  • [45] Wainwright, M. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [46] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
  • [47] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.