• Bernoulli
  • Volume 23, Number 1 (2017), 552-581.

On the prediction performance of the Lasso

Arnak S. Dalalyan, Mohamed Hebiri, and Johannes Lederer

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter can lead to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. We finally show that our results also lead to near-optimal rates for the least-squares estimator with total variation penalty.

Article information

Bernoulli, Volume 23, Number 1 (2017), 552-581.

Received: October 2014
Revised: July 2015
First available in Project Euclid: 27 September 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

multiple linear regression oracle inequalities sparse recovery total variation penalty


Dalalyan, Arnak S.; Hebiri, Mohamed; Lederer, Johannes. On the prediction performance of the Lasso. Bernoulli 23 (2017), no. 1, 552--581. doi:10.3150/15-BEJ756.

Export citation


  • [1] Alquier, P. and Hebiri, M. (2012). Transductive versions of the LASSO and the Dantzig selector. J. Statist. Plann. Inference 142 2485–2500.
  • [2] Bach, F.R., Jenatton, R., Mairal, J. and Obozinski, G. (2012). Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4 1–106.
  • [3] Belloni, A. and Chernozhukov, V. (2011). $\ell_{1}$-penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 82–130.
  • [4] Belloni, A., Chernozhukov, V. and Wang, L. (2014). Pivotal estimation via square-root Lasso in nonparametric regression. Ann. Statist. 42 757–788.
  • [5] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [6] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Statist. 37 157–183.
  • [7] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • [8] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [9] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [10] Cai, T.T., Wang, L. and Xu, G. (2010). Shifting inequality and recovery of sparse signals. IEEE Trans. Signal Process. 58 1300–1308.
  • [11] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [12] Candès, E.J. and Plan, Y. (2009). Near-ideal model selection by $\ell_{1}$ minimization. Ann. Statist. 37 2145–2177.
  • [13] Chatterjee, S., Guntuboyina, A. and Sen, B. (2013). Improved risk bounds in isotonic regression. Technical report. Available at arxiv:1311.3765.
  • [14] Chichignoud, M., Lederer, J. and Wainwright, M. (2014). Tuning Lasso for sup-norm optimality. Preprint. Available at arXiv:1410.0247.
  • [15] Dalalyan, A.S. and Chen, Y. (2012). Fused sparsity and robust estimation for linear models with unknown variance. In Advances in Neural Information Processing Systems 25: NIPS 1268–1276. Red Hook, NY.
  • [16] Dalalyan, A.S. and Tsybakov, A.B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Springer, Berlin.
  • [17] Dalalyan, A.S. and Tsybakov, A.B. (2012). Mirror averaging with sparsity priors. Bernoulli 18 914–944.
  • [18] Dalalyan, A.S. and Tsybakov, A.B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. J. Comput. System Sci. 78 1423–1443.
  • [19] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • [20] Gautier, E. and Tsybakov, A.E. (2011). High-dimensional instrumental variables regression and confidence sets. Technical report. Available at arXiv:1105.2454.
  • [21] Harchaoui, Z. and Lévy-Leduc, C. (2007). Catching change-points with lasso. In Annual Conference on Neural Information Processing Systems.
  • [22] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc. 105 1480–1493.
  • [23] Hebiri, M. and Lederer, J. (2013). How correlations influence Lasso prediction. IEEE Trans. Inform. Theory 59 1846–1854.
  • [24] Juditsky, A. and Nemirovski, A. (2011). Accuracy guarantees for $\ell_{1}$-recovery. IEEE Trans. Inform. Theory 57 7818–7839.
  • [25] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Heidelberg: Springer.
  • [26] Koltchinskii, V., Lounici, K. and Tsybakov, A.B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
  • [27] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A.B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
  • [28] Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387–413.
  • [29] Raskutti, G., Wainwright, M.J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • [30] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [31] Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_{1}$-penalization for mixture regression models. TEST 19 209–256.
  • [32] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • [33] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [34] van de Geer, S. (2007). The deterministic lasso. In Proc. of Joint Statistical Meeting. Amer. Statist. Assoc.
  • [35] van de Geer, S. and Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. In From Probability to Statistics and Back: High-Dimensional Models and Processes. Inst. Math. Stat. (IMS) Collect. 9 303–316. IMS, Beachwood, OH.
  • [36] van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • [37] Vert, J.-P. and Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group lars. In Advances in Neural Information Processing Systems 2343–2351.
  • [38] Wainwright, M.J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [39] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.
  • [40] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
  • [41] Zhang, C.-H. (2002). Risk bounds in isotonic regression. Ann. Statist. 30 528–555.
  • [42] Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_{1}$ regularization. Ann. Statist. 37 2109–2144.