The Annals of Statistics

A significance test for the lasso

Abstract

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an $\operatorname{Exp}(1)$ asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix $X$. On the other hand, our proof for a general step in the lasso path places further technical assumptions on $X$ and the generative model, but still allows for the important high-dimensional case $p>n$, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables.

Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a $\chi^{2}_{1}$ distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than $\chi^{2}_{1}$ under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter $\lambda$ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the $\ell_{1}$ penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and shrinkage—and its null distribution is tractable and asymptotically $\operatorname{Exp}(1)$.

Article information

Source
Ann. Statist., Volume 42, Number 2 (2014), 413-468.

Dates
First available in Project Euclid: 20 May 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1400592161

Digital Object Identifier
doi:10.1214/13-AOS1175

Mathematical Reviews number (MathSciNet)
MR3210970

Zentralblatt MATH identifier
1305.62255

Citation

Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert. A significance test for the lasso. Ann. Statist. 42 (2014), no. 2, 413--468. doi:10.1214/13-AOS1175. https://projecteuclid.org/euclid.aos/1400592161

References

• Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
• Becker, S., Bobin, J. and Candès, E. J. (2011). NESTA: A fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4 1–39.
• Becker, S. R., Candès, E. J. and Grant, M. C. (2011). Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3 165–218.
• Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternative direction method of multipliers. Faund. Trends Mach. Learn. 3 1–122.
• Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
• Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $\ell_1$ minimization. Ann. Statist. 37 2145–2177.
• Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.
• Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
• de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer, New York.
• Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
• Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461–470.
• Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh-dimensional regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 37–65.
• Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
• Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
• Fuchs, J. J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601–3608.
• Grazier G’Sell, M., Taylor, J. and Tibshirani, R. (2013). Adaptive testing for the graphical lasso. Preprint. Available at arXiv:1307.4765.
• Grazier G’Sell, M., Wager, S., Chouldechova, A. and Tibshirani, R. (2013). False discovery rate control for sequential selection procedures, with application to the lasso. Preprint. Available at arXiv:1309.5352.
• Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning; Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
• Javanmard, A. and Montanari, A. (2013a). Confidence intervals and hypothesis testing for high-dimensional regression. Preprint. Available at arXiv:1306.3171.
• Javanmard, A. and Montanari, A. (2013b). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. Preprint. Available at arXiv:1301.4240.
• Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417–473.
• Meinshausen, N., Meier, L. and Bühlmann, P. (2009). $p$-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671–1681.
• Minnier, J., Tian, L. and Cai, T. (2011). A perturbation method for inference on regularized regression estimates. J. Amer. Statist. Assoc. 106 1371–1382.
• Osborne, M. R., Presnell, B. and Turlach, B. A. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403.
• Osborne, M. R., Presnell, B. and Turlach, B. A. (2000b). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337.
• Park, M. Y. and Hastie, T. (2007). $L_1$-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 659–677.
• Rhee, S.-Y., Gonzales, M. J., Kantor, R., Betts, B. J., Ravela, J. and Shafer, R. W. (2003). Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 31 298–303.
• Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
• Taylor, J., Loftus, J. and Tibshirani, R. J. (2013). Tests in adaptive regression via the Kac–Rice formula. Preprint. Available at arXiv:1308.3020.
• Taylor, J., Takemura, A. and Adler, R. J. (2005). Validity of the expected Euler characteristic heuristic. Ann. Probab. 33 1362–1396.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• Tibshirani, Ryan J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456–1490.
• Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198–1232.
• van de Geer, S. and Bühlmann, P. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Preprint. Available at arXiv:1303.0518.
• Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
• Weissman, I. (1978). Estimation of parameters and large quantiles based on the $k$ largest observations. J. Amer. Statist. Assoc. 73 812–815.
• Zhang, C.-H. and Zhang, S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 217–242.
• Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
• Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
• Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173–2192.