Statistical Science

On the Sensitivity of the Lasso to the Number of Predictor Variables

Cheryl J. Flynn, Clifford M. Hurvich, and Jeffrey S. Simonoff

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The Lasso is a computationally efficient regression regularization procedure that can produce sparse estimators when the number of predictors $(p)$ is large. Oracle inequalities provide probability loss bounds for the Lasso estimator at a deterministic choice of the regularization parameter. These bounds tend to zero if $p$ is appropriately controlled, and are thus commonly cited as theoretical justification for the Lasso and its ability to handle high-dimensional settings. Unfortunately, in practice the regularization parameter is not selected to be a deterministic quantity, but is instead chosen using a random, data-dependent procedure. To address this shortcoming of previous theoretical work, we study the loss of the Lasso estimator when tuned optimally for prediction. Assuming orthonormal predictors and a sparse true model, we prove that the probability that the best possible predictive performance of the Lasso deteriorates as $p$ increases is positive and can be arbitrarily close to one given a sufficiently high signal to noise ratio and sufficiently large $p$. We further demonstrate empirically that the amount of deterioration in performance can be far worse than the oracle inequalities suggest and provide a real data example where deterioration is observed.

Article information

Source
Statist. Sci., Volume 32, Number 1 (2017), 88-105.

Dates
First available in Project Euclid: 6 April 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1491465629

Digital Object Identifier
doi:10.1214/16-STS586

Mathematical Reviews number (MathSciNet)
MR3634308

Zentralblatt MATH identifier
06946265

Keywords
Least absolute shrinkage and selection operator (Lasso) oracle inequalities high-dimensional data

Citation

Flynn, Cheryl J.; Hurvich, Clifford M.; Simonoff, Jeffrey S. On the Sensitivity of the Lasso to the Number of Predictor Variables. Statist. Sci. 32 (2017), no. 1, 88--105. doi:10.1214/16-STS586. https://projecteuclid.org/euclid.ss/1491465629


Export citation

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In International Symposium on Information Theory, 2nd, Tsahkadsor, Armenian SSR 267–281.
  • Ando, T. and Li, K.-C. (2014). A model-averaging approach for high-dimensional regression. J. Amer. Statist. Assoc. 109 254–265.
  • Bertsimas, D., King, A. and Mazumder, R. (2016). Best subset selection via a modern optimization lens. Ann. Statist. 44 813–852.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bien, J., Taylor, J. and Tibshirani, R. (2013). A Lasso for hierarchical interactions. Ann. Statist. 41 1111–1141.
  • Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer, Berlin.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via $l_{1}$ penalized least squares. In Learning Theory. Lecture Notes in Computer Science 4005 379–391. Springer, Berlin.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007a). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • Bunea, F., Tsybakov, A. and Wegkamp, M. (2007b). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $\ell_{1}$ minimization. Ann. Statist. 37 2145–2177.
  • Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist. 42 2340–2381.
  • Craven, P. and Wahba, G. (1978). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377–403.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Flynn, C. J., Hurvich, C. M. and Simonoff, J. S. (2013). Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J. Amer. Statist. Assoc. 108 1031–1043.
  • Flynn, C. J., Hurvich, C. M. and Simonoff, J. S. (2016). Deterioration of performance of the Lasso with many predictors: Discussion of a paper by Tutz and Gertheiss. Stat. Model. 16 212–216.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
  • Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under $l_{1}$ constraint. Ann. Statist. 34 2367–2386.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Homrighausen, D. and McDonald, D. J. (2014). Leave-one-out cross-validation is risk consistent for lasso. Mach. Learn. 97 65–78.
  • Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika 76 297–307.
  • Hyndman, R. J., Booth, H. and Yasmeen, F. (2013). Coherent mortality forecasting: The product-ratio method with functional time series models. Demography 50 261–283.
  • Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica 16 1273–1284.
  • Lin, D., Foster, D. P. and Ungar, L. H. (2011). VIF regression: A fast regression algorithm for large data. J. Amer. Statist. Assoc. 106 232–247.
  • Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
  • Rhee, S.-Y., Taylor, J., Wadhera, G., Ben-Hur, A. and Brutlag, D. L. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. USA 103 17355–17360.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Thrampoulidis, C., Panahi, A. and Hassibi, B. (2015). Asymptotically exact error analysis for the generalized $l_{2}^{2}$-LASSO. Preprint. Available at arXiv:1502.06287.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Vidaurre, D., Bielza, C. and Larrañaga, P. (2013). A survey of $L_{1}$ regression. Int. Stat. Rev. 81 361–387.
  • Yu, Y. and Feng, Y. (2014). Modified cross-validation for penalized high-dimensional linear regression models. J. Comput. Graph. Statist. 23 1009–1027.
  • Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173–2192.