## The Annals of Statistics

### Inference in adaptive regression via the Kac–Rice formula

#### Abstract

We derive an exact $p$-value for testing a global null hypothesis in a general adaptive regression setting. Our approach uses the Kac–Rice formula [as described in Random Fields and Geometry (2007) Springer, New York] applied to the problem of maximizing a Gaussian process. The resulting test statistic has a known distribution in finite samples, assuming Gaussian errors. We examine this test statistic in the case of the lasso, group lasso, principal components and matrix completion problems. For the lasso problem, our test relates closely to the recently proposed covariance test of Lockhart et al. [Ann. Statist. (2004) 42 413–468].

In a few specific settings, our proposed tests will be less powerful than other previously known (and well-established) tests. However, it should be noted that the real strength of our proposal here is its generality. We provide a framework for constructing valid tests across a wide class of regularized regression problems, and as far as we can tell, such a unified view was not possible before this work.

#### Article information

Source
Ann. Statist., Volume 44, Number 2 (2016), 743-770.

Dates
Revised: September 2015
First available in Project Euclid: 17 March 2016

https://projecteuclid.org/euclid.aos/1458245734

Digital Object Identifier
doi:10.1214/15-AOS1386

Mathematical Reviews number (MathSciNet)
MR3476616

Zentralblatt MATH identifier
1337.62304

Subjects
Primary: 62M40: Random fields; image analysis
Secondary: 62J05: Linear regression

#### Citation

Taylor, Jonathan E.; Loftus, Joshua R.; Tibshirani, Ryan J. Inference in adaptive regression via the Kac–Rice formula. Ann. Statist. 44 (2016), no. 2, 743--770. doi:10.1214/15-AOS1386. https://projecteuclid.org/euclid.aos/1458245734

#### References

• [1] Adler, R. J. and Taylor, J. E. (2007). Random Fields and Geometry. Springer, New York.
• [2] Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
• [3] Azaïs, J.-M. and Wschebor, M. (2008). A general expression for the distribution of the maximum of a Gaussian field and the approximation of the tail. Stochastic Process. Appl. 118 1190–1218.
• [4] Brillinger, D. R. (1972). On the number of solutions of systems of random equations. Ann. Math. Statist. 43 534–540.
• [5] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
• [6] Choi, Y., Taylor, J. and Tibshirani, R. (2014). Selecting the number of principal components: Estimation of the true rank of a noisy matrix. Preprint. Available at arXiv:1410.8260.
• [7] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• [8] Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418–491.
• [9] Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. Preprint. Available at arXiv:1410.2597.
• [10] Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist. 42 413–468.
• [11] Loftus, J. and Taylor, J. (2014). A significance test for forward stepwise model selection. Preprint. Available at arXiv:1405.3920.
• [12] Mazumder, R., Hastie, T. and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11 2287–2322.
• [13] Mukherjee, A., Chen, K., Wang, N. and Zhu, J. (2015). On the degrees of freedom of reduced-rank estimators in multivariate regression. Biometrika 102 457–477.
• [14] Takemura, A. and Kuriki, S. (2002). On the equivalence of the tube and Euler characteristic methods for the distribution of the maximum of Gaussian fields over piecewise smooth domains. Ann. Appl. Probab. 12 768–796.
• [15] Taylor, J. (2013). The geometry of least squares in the 21st century. Bernoulli 19 1449–1464.
• [16] Taylor, J., Lockhart, R., Tibshirani, R. J. and Tibshirani, R. (2014). Exact post-selection inference for sequential regression procedures. Preprint. Available at arXiv:1401.3889.
• [17] Taylor, J., Loftus, J. and Tibshirani, R. J. (2015). Supplement to “Inference in adaptive regression via the Kac–Rice formula.” DOI:10.1214/15-AOS1386SUPP.
• [18] Taylor, J., Takemura, A. and Adler, R. J. (2005). Validity of the expected Euler characteristic heuristic. Ann. Probab. 33 1362–1396.
• [19] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [20] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
• [21] Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456–1490.
• [22] Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
• [23] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.