Annales de l'Institut Henri Poincaré, Probabilités et Statistiques

High-dimensional Gaussian model selection on a Gaussian design

Nicolas Verzelen

Full-text: Open access

Abstract

We consider the problem of estimating the conditional mean of a real Gaussian variable Y=∑i=1pθiXi+ɛ where the vector of the covariates (Xi)1≤ip follows a joint Gaussian distribution. This issue often occurs when one aims at estimating the graph or the distribution of a Gaussian graphical model. We introduce a general model selection procedure which is based on the minimization of a penalized least squares type criterion. It handles a variety of problems such as ordered and complete variable selection, allows to incorporate some prior knowledge on the model and applies when the number of covariates p is larger than the number of observations n. Moreover, it is shown to achieve a non-asymptotic oracle inequality independently of the correlation structure of the covariates. We also exhibit various minimax rates of estimation in the considered framework and hence derive adaptivity properties of our procedure.

Résumé

Nous nous intéressons à l’estimation de l’espérance conditionelle d’une variable Gaussienne. Ce problème est courant lorsque l’on veut estimer le graphe ou la distribution d’un modèle graphique gaussien. Dans cet article, nous introduisons une procédure de sélection de modèle basée sur la minimisation d’un critére des moindres carrés pénalisés. Cette méthode générale permet de traiter un grand nombre de problèmes comme la sélection ordonnée ou la sélection complête de variables. De plus, elle reste valable dans un cadre de « grande dimension »: lorsque le nombre de covariables est bien plus élevé que le nombre d’observations. L’estimateur obtenue vérifie une inégalité oracle non-asymptotique et ce quelque soit la corrélation entre les covariables. Nous calculons également des vitesses minimax d’estimation dans ce cadre et montrons que notre procédure vérifie diverses propriétés d’adaptation.

Article information

Source
Ann. Inst. H. Poincaré Probab. Statist., Volume 46, Number 2 (2010), 480-524.

Dates
First available in Project Euclid: 11 May 2010

Permanent link to this document
https://projecteuclid.org/euclid.aihp/1273584132

Digital Object Identifier
doi:10.1214/09-AIHP321

Mathematical Reviews number (MathSciNet)
MR2667707

Zentralblatt MATH identifier
1191.62076

Subjects
Primary: 62J05: Linear regression
Secondary: 62G08: Nonparametric regression

Keywords
Model selection Linear regression Oracle inequalities Gaussian graphical models Minimax rates of estimation

Citation

Verzelen, Nicolas. High-dimensional Gaussian model selection on a Gaussian design. Ann. Inst. H. Poincaré Probab. Statist. 46 (2010), no. 2, 480--524. doi:10.1214/09-AIHP321. https://projecteuclid.org/euclid.aihp/1273584132


Export citation

References

  • [1] H. Akaike. Statistical predictor identification. Ann. Inst. Statist. Math. 22 (1970) 203–217.
  • [2] H. Akaike. A new look at the statistical model identification. IEEE Trans. Automat. Control 19 (1974) 716–723.
  • [3] S. Arlot. Model selection by resampling penalization. Electron. J. Stat. 3 (2009) 557–624.
  • [4] Y. Baraud, C. Giraud and S. Huet. Gaussian model selection with an unknown variance. Ann. Statist. 37 (2009) 630–672.
  • [5] P. Bickel, Y. Ritov and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 (2009) 1705–1732.
  • [6] L. Birgé. A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory 51 (2005) 1611–1615.
  • [7] L. Birgé and P. Massart. Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 (1998) 329–375.
  • [8] L. Birgé and P. Massart. Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 (2001) 203–268.
  • [9] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 (2007) 33–73.
  • [10] F. Bunea, A. Tsybakov and M. Wegkamp. Aggregation for Gaussian regression. Ann. Statist. 35 (2007) 1674–1697.
  • [11] F. Bunea, A. Tsybakov and M. Wegkamp. Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 (2007) 169–194 (electronic).
  • [12] E. J. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Inform. Theory 51 (2005) 4203–4215.
  • [13] E. Candes and T. Tao. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 (2007) 2313–2351.
  • [14] E. Candès and Y. Plan. Near-ideal model selection by l1 minimization. Ann. Statist. To appear, 2009.
  • [15] R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Statistics for Engineering and Information Science. Springer, New York, 1999.
  • [16] N. A. C. Cressie. Statistics for Spatial Data. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York, 1993. (Revised reprint of the 1991 edition, Wiley.)
  • [17] K. R. Davidson and S. J. Szarek. Local operator theory, random matrices and Banach spaces. In Handbook of the Geometry of Banach Spaces, Vol. I 317–366. North-Holland, Amsterdam, 2001.
  • [18] C. Giraud. Estimation of Gaussian graphs by model selection. Electron. J. Stat. 2 (2008) 542–563.
  • [19] T. Gneiting. Power-law correlations, related models for long-range dependence and their simulation. J. Appl. Probab. 37 (2000) 1104–1109.
  • [20] M. Kalisch and P. Bühlmann. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 (2007) 613–636.
  • [21] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 (2000) 1302–1338.
  • [22] S. L. Lauritzen. Graphical Models. Oxford Statistical Science Series 17. The Clarendon Press, Oxford University Press, New York, 1996.
  • [23] C. L. Mallows. Some comments on Cp. Technometrics 15 (1973) 661–675.
  • [24] P. Massart. Concentration Inequalities and Model Selection. Lecture Notes in Mathematics 1896. Springer, Berlin, 2007. (Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, with a foreword by Jean Picard.)
  • [25] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 (2006) 1436–1462.
  • [26] V. H. de la Peña and E. Giné. Decoupling. Probability and Its Applications. Springer, New York, 1999. (From dependence to independence, randomly stopped processes. U-statistics and processes. Martingales and beyond.)
  • [27] D. von Rosen. Moments for the inverted Wishart distribution. Scand. J. Statist. 15 (1988) 97–109.
  • [28] H. Rue and L. Held. Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. Chapman & Hall/CRC, London, 2005.
  • [29] K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science 308 (2005) 523–529.
  • [30] J. Schäfer and K. Strimmer. An empirical Bayes approach to inferring large-scale gene association network. Bioinformatics 21 (2005) 754–764.
  • [31] G. Schwarz. Estimating the dimension of a model. Ann. Statist. 6 (1978) 461–464.
  • [32] R. Shibata. An optimal selection of regression variables. Biometrika 68 (1981) 45–54.
  • [33] C. Stone. An asymptotically optimal histogram selection rule. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983) 513–520. Wadsworth Statist./Probab. Ser. Wadsworth, Belmont, CA, 1985.
  • [34] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 (1996) 267–288.
  • [35] A. Tsybakov. Optimal rates of aggregation. In 16th Annual Conference on Learning Theory 2777 303–313. Springer, Heidelberg, 2003.
  • [36] N. Verzelen and F. Villers. Goodness-of-fit tests for high-dimensional Gaussian linear models. Ann. Statist. To appear, 2009.
  • [37] M. J. Wainwright. Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. Technical Report 725, Department of Statistics, UC Berkeley, 2007.
  • [38] A. Wille, P. Zimmermann, E. Vranova, A. Fürholz, O. Laule, S. Bleuler, L. Hennig, A. Prelic, P. von Rohr, L. Thiele, E. Zitzler, W. Gruissem and P. Bühlmann. Sparse graphical Gaussian modelling of the isoprenoid gene network in arabidopsis thaliana. Genome Biology 5 (2004), no. R92.
  • [39] P. Zhao and B. Yu. On model selection consistency of Lasso. J. Mach. Learn. Res. 7 (2006) 2541–2563.
  • [40] H. Zou. The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 (2006) 1418–1429.