The Annals of Statistics

Goodness-of-fit tests for high-dimensional Gaussian linear models

Nicolas Verzelen and Fanny Villers

Full-text: Open access

Abstract

Let (Y, (Xi)1≤ip) be a real zero mean Gaussian vector and V be a subset of {1, …, p}. Suppose we are given n i.i.d. replications of this vector. We propose a new test for testing that Y is independent of (Xi)i∈{1, …, p}∖V conditionally to (Xi)iV against the general alternative that it is not. This procedure does not depend on any prior information on the covariance of X or the variance of Y and applies in a high-dimensional setting. It straightforwardly extends to test the neighborhood of a Gaussian graphical model. The procedure is based on a model of Gaussian regression with random Gaussian covariates. We give nonasymptotic properties of the test and we prove that it is rate optimal [up to a possible log(n) factor] over various classes of alternatives under some additional assumptions. Moreover, it allows us to derive nonasymptotic minimax rates of testing in this random design setting. Finally, we carry out a simulation study in order to evaluate the performance of our procedure.

Article information

Source
Ann. Statist., Volume 38, Number 2 (2010), 704-752.

Dates
First available in Project Euclid: 19 February 2010

Permanent link to this document
https://projecteuclid.org/euclid.aos/1266586612

Digital Object Identifier
doi:10.1214/08-AOS629

Mathematical Reviews number (MathSciNet)
MR2604699

Zentralblatt MATH identifier
1183.62074

Subjects
Primary: 62J05: Linear regression
Secondary: 62G10: Hypothesis testing 62H20: Measures of association (correlation, canonical correlation, etc.)

Keywords
Gaussian graphical models linear regression multiple testing ellipsoid adaptive testing minimax hypothesis testing minimax separation rate goodness-of-fit

Citation

Verzelen, Nicolas; Villers, Fanny. Goodness-of-fit tests for high-dimensional Gaussian linear models. Ann. Statist. 38 (2010), no. 2, 704--752. doi:10.1214/08-AOS629. https://projecteuclid.org/euclid.aos/1266586612


Export citation

References

  • [1] Aldous, D. J. (1985). Exchangeability and related topics. In École d’été de probabilités de Saint Flour XIII. Lecture Notes in Math. 1117. Springer, Berlin.
  • [2] Baraud, Y. (2002). Non-asymptotic rates of testing in signal detection. Bernoulli 8 577–606.
  • [3] Baraud, Y., Huet, S. and Laurent, B. (2003). Adaptative tests of linear hypotheses by model selection. Ann. Statist. 31 225–251.
  • [4] Bühlmann, P., Kalisch, M. and Maathuis, M. H. (2009). Variable selection for high-dimensional models: Partially faithful distributions and the PC-simple algorithm. Biometrika. To appear.
  • [5] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [6] Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Springer, New York.
  • [7] Cressie, N. (1993). Statistics for Spatial Data, revised ed. Wiley, New York.
  • [8] Drton, M. and Perlman, M. (2007). Multiple testing and error control in Gaussian graphical model selection. Statist. Sci. 22 430–449.
  • [9] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • [10] Giraud, C. (2008). Estimation of Gaussian graphs by model selection. Electron. J. Stat. 2 542–563.
  • [11] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likehood. Biometrika 93 85–98.
  • [12] Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives I. Math. Methods Statist. 2 85–114.
  • [13] Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives II. Math. Methods Statist. 3 171–189.
  • [14] Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives III. Math. Methods Statist. 4 249–268.
  • [15] Kishino, H. and Waddell, P. (2000). Correspondence analysis of genes and tissue types and finding genetic links from microarray data. Genome Informatics 11 83–95.
  • [16] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic function by model selection. Ann. Statist. 28 1302–1338.
  • [17] Lauritzen, S. L. (1996). Graphical Models. Oxford Univ. Press, New York.
  • [18] Massart, P. (2007). Concentration inequalities and model selection. In École d’été de probabilités de Saint Flour XXXIII. Lecture Notes in Math. 1896. Springer, Berlin.
  • [19] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [20] Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Chapman and Hall/CRC, London.
  • [21] Schäfer, J. and Strimmer, K. (2005). An empirical Bayes approach to inferring large-scale gene association network. Bioinformatics 21 754–764.
  • [22] Spokoiny, V. G. (1996). Adaptative hypothesis testing using wavelets. Ann. Statist. 24 2477–2498.
  • [23] Verzelen, N. and Villers, F. (2009). Tests for Gaussian graphical models. Comput. Statist. Data Anal. 53 1894–1905.
  • [24] Wainwright, M. J. (2007). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. Technical Report 725, Dept. Statistics, Univ. California, Berkeley.
  • [25] Wille, A. and Bühlmann, P. (2006). Low-order conditional independence graphs for inferring genetic networks. Stat. Appl. Genet. Mol. Biol. 5 Art. 1 (electronic).
  • [26] Wille, A., Zimmermann, P., Vranova, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelic, A., von Rohr, P., Thiele, L., Zitzler, E., Gruissem, W. and Bühlmann, P. (2004). Sparse graphical Gaussian modelling of the isoprenoid gene network in arabidopsis thaliana. Genome Biology 5 11.
  • [27] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • [28] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [29] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • [30] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the Elastic Net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.