The Annals of Statistics

Test for high-dimensional regression coefficients using refitted cross-validation variance estimation

Hengjian Cui, Wenwen Guo, and Wei Zhong

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Testing a hypothesis for high-dimensional regression coefficients is of fundamental importance in the statistical theory and applications. In this paper, we develop a new test for the overall significance of coefficients in high-dimensional linear regression models based on an estimated U-statistics of order two. With the aid of the martingale central limit theorem, we prove that the asymptotic distributions of the proposed test are normal under two different distribution assumptions. Refitted cross-validation (RCV) variance estimation is utilized to avoid the overestimation of the variance and enhance the empirical power. We examine the finite-sample performances of the proposed test via Monte Carlo simulations, which show that the new test based on the RCV estimator achieves higher powers, especially for the sparse cases. We also demonstrate an application by an empirical analysis of a microarray data set on Yorkshire gilts.

Article information

Source
Ann. Statist., Volume 46, Number 3 (2018), 958-988.

Dates
Received: February 2016
Revised: April 2017
First available in Project Euclid: 3 May 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1525313072

Digital Object Identifier
doi:10.1214/17-AOS1573

Mathematical Reviews number (MathSciNet)
MR3797993

Zentralblatt MATH identifier
1392.62159

Subjects
Primary: 62F03: Hypothesis testing 62H15: Hypothesis testing

Keywords
High-dimensional regression hypothesis testing martingale central limit theorem refitted cross-validation variance estimation U-statistics

Citation

Cui, Hengjian; Guo, Wenwen; Zhong, Wei. Test for high-dimensional regression coefficients using refitted cross-validation variance estimation. Ann. Statist. 46 (2018), no. 3, 958--988. doi:10.1214/17-AOS1573. https://projecteuclid.org/euclid.aos/1525313072


Export citation

References

  • [1] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.
  • [2] Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.
  • [3] Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.
  • [4] Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for high dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819.
  • [5] Cui, H., Guo, W. and Zhong, W. (2018). Supplement to “Test for high-dimensional regression coefficients using refitted cross-validation variance estimation.” DOI:10.1214/17-AOS1573SUPP.
  • [6] Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 37–65.
  • [7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and it oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [8] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.
  • [9] Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London.
  • [10] Goeman, J. J., Finos, L. and van Houwelingen, J. C. (2011). Testing against a high dimensional alternative in the generalized linear model: Asymptotic alpha-level control. Biometrika 98 381–390.
  • [11] Goeman, J. J., van de Geer, S. and van Houwelingen, J. C. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 477–493.
  • [12] Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York.
  • [13] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129–1139.
  • [14] Lkhagvadorj, S., Qu, L., Cai, W., Couture, O. P., Barb, C. R., Hausman, G. J., Nettleton, D., Anderson, L. L., Dekkers, J. C. M. and Tuggle, C. K. (2009). Microarray gene expression profiles of fasting induced changes in liver and adipose tissues of pigs expressing the melanocortin-4 receptor D298N variant. Physiol. Genomics 38 98–111.
  • [15] Rao, C. R., Touteburg, H., Shalabh and Heumann, C. (2008). Linear Models and Generalizations. Springer, New York.
  • [16] Schmidt, R. (2001). Tail dependence for elliptically contoured distributions. Math. Methods Oper. Res. 55 301–327.
  • [17] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386–402.
  • [18] Tibshirani, R. (1996). Regression shrinkage and selection via LASSO. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [19] Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658–1669.
  • [20] Wang, S. and Cui, H. (2013). Generalized $F$ test for high dimensional linear regression coefficients. J. Multivariate Anal. 117 134–149.
  • [21] Wang, S. and Cui, H. (2015). A new test for part of high dimensional regression coefficients. J. Multivariate Anal. 137 187–203.
  • [22] Yata, K. and Aoshima, M. (2013). Correlation tests for high-dimensional data using extended cross-data-matrix methodology. J. Multivariate Anal. 117 313–331.
  • [23] Zhang, C. H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [24] Zhong, P. S. and Chen, S. X. (2011). Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc. 106 260–274.
  • [25] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

Supplemental materials

  • Supplement to “Test for high-dimensional regression coefficients using refitted cross-validation variance estimation”. This supplemental article contains the proof of Theorem 3.2 and additional figures of empirical powers of different tests.