## Statistical Science

### On the Sensitivity of the Lasso to the Number of Predictor Variables

#### Abstract

The Lasso is a computationally efficient regression regularization procedure that can produce sparse estimators when the number of predictors $(p)$ is large. Oracle inequalities provide probability loss bounds for the Lasso estimator at a deterministic choice of the regularization parameter. These bounds tend to zero if $p$ is appropriately controlled, and are thus commonly cited as theoretical justification for the Lasso and its ability to handle high-dimensional settings. Unfortunately, in practice the regularization parameter is not selected to be a deterministic quantity, but is instead chosen using a random, data-dependent procedure. To address this shortcoming of previous theoretical work, we study the loss of the Lasso estimator when tuned optimally for prediction. Assuming orthonormal predictors and a sparse true model, we prove that the probability that the best possible predictive performance of the Lasso deteriorates as $p$ increases is positive and can be arbitrarily close to one given a sufficiently high signal to noise ratio and sufficiently large $p$. We further demonstrate empirically that the amount of deterioration in performance can be far worse than the oracle inequalities suggest and provide a real data example where deterioration is observed.

#### Article information

Source
Statist. Sci., Volume 32, Number 1 (2017), 88-105.

Dates
First available in Project Euclid: 6 April 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1491465629

Digital Object Identifier
doi:10.1214/16-STS586

Mathematical Reviews number (MathSciNet)
MR3634308

Zentralblatt MATH identifier
06946265

#### Citation

Flynn, Cheryl J.; Hurvich, Clifford M.; Simonoff, Jeffrey S. On the Sensitivity of the Lasso to the Number of Predictor Variables. Statist. Sci. 32 (2017), no. 1, 88--105. doi:10.1214/16-STS586. https://projecteuclid.org/euclid.ss/1491465629

#### References

• Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In International Symposium on Information Theory, 2nd, Tsahkadsor, Armenian SSR 267–281.
• Ando, T. and Li, K.-C. (2014). A model-averaging approach for high-dimensional regression. J. Amer. Statist. Assoc. 109 254–265.
• Bertsimas, D., King, A. and Mazumder, R. (2016). Best subset selection via a modern optimization lens. Ann. Statist. 44 813–852.
• Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• Bien, J., Taylor, J. and Tibshirani, R. (2013). A Lasso for hierarchical interactions. Ann. Statist. 41 1111–1141.
• Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer, Berlin.
• Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via $l_{1}$ penalized least squares. In Learning Theory. Lecture Notes in Computer Science 4005 379–391. Springer, Berlin.
• Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007a). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
• Bunea, F., Tsybakov, A. and Wegkamp, M. (2007b). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $\ell_{1}$ minimization. Ann. Statist. 37 2145–2177.
• Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist. 42 2340–2381.
• Craven, P. and Wahba, G. (1978). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377–403.
• Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• Flynn, C. J., Hurvich, C. M. and Simonoff, J. S. (2013). Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J. Amer. Statist. Assoc. 108 1031–1043.
• Flynn, C. J., Hurvich, C. M. and Simonoff, J. S. (2016). Deterioration of performance of the Lasso with many predictors: Discussion of a paper by Tutz and Gertheiss. Stat. Model. 16 212–216.
• Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
• Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under $l_{1}$ constraint. Ann. Statist. 34 2367–2386.
• Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• Homrighausen, D. and McDonald, D. J. (2014). Leave-one-out cross-validation is risk consistent for lasso. Mach. Learn. 97 65–78.
• Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika 76 297–307.
• Hyndman, R. J., Booth, H. and Yasmeen, F. (2013). Coherent mortality forecasting: The product-ratio method with functional time series models. Demography 50 261–283.
• Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica 16 1273–1284.
• Lin, D., Foster, D. P. and Ungar, L. H. (2011). VIF regression: A fast regression algorithm for large data. J. Amer. Statist. Assoc. 106 232–247.
• Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
• Rhee, S.-Y., Taylor, J., Wadhera, G., Ben-Hur, A. and Brutlag, D. L. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. USA 103 17355–17360.
• Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
• Thrampoulidis, C., Panahi, A. and Hassibi, B. (2015). Asymptotically exact error analysis for the generalized $l_{2}^{2}$-LASSO. Preprint. Available at arXiv:1502.06287.
• Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• Vidaurre, D., Bielza, C. and Larrañaga, P. (2013). A survey of $L_{1}$ regression. Int. Stat. Rev. 81 361–387.
• Yu, Y. and Feng, Y. (2014). Modified cross-validation for penalized high-dimensional linear regression models. J. Comput. Graph. Statist. 23 1009–1027.
• Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173–2192.