Electronic Journal of Statistics

High-dimensional inference in misspecified linear models

Peter Bühlmann and Sara van de Geer

Full-text: Open access

Abstract

We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our results contribute to robustness considerations with respect to model misspecification.

Article information

Source
Electron. J. Statist., Volume 9, Number 1 (2015), 1449-1473.

Dates
Received: March 2015
First available in Project Euclid: 7 July 2015

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1436277593

Digital Object Identifier
doi:10.1214/15-EJS1041

Mathematical Reviews number (MathSciNet)
MR3367666

Zentralblatt MATH identifier
1327.62420

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62F25: Tolerance and confidence regions

Keywords
Confidence interval de-sparsified Lasso hypothesis test Lasso multiple sample splitting sparsity

Citation

Bühlmann, Peter; van de Geer, Sara. High-dimensional inference in misspecified linear models. Electron. J. Statist. 9 (2015), no. 1, 1449--1473. doi:10.1214/15-EJS1041. https://projecteuclid.org/euclid.ejs/1436277593


Export citation

References

  • [1] Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain., Econometrica, 80:2369–2429.
  • [2] Belloni, A., Chernozhukov, V., and Wang, L. (2011). Square-root Lasso: Pivotal recovery of sparse signals via conic programming., Biometrika, 98:791–806.
  • [3] Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters., Journal of the American Statistical Association, 100:71–81.
  • [4] Brown, L. (1990). An ancillarity paradox which appears in multiple linear regression., Annals of Statistics, 18:471–493.
  • [5] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models., Bernoulli, 19:1212–1242.
  • [6] Bühlmann, P. and van de Geer, S. (2011)., Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer.
  • [7] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion)., Annals of Statistics, 35:2313–2404.
  • [8] Candès, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies?, IEEE Transactions on Information Theory, 52:5406–5425.
  • [9] Chen, S. S., Donoho, D. L., and Saunders, M. A. (1998). Atomic decomposition by basis pursuit., SIAM Journal on Scientific Computing, 20:33–61.
  • [10] Dezeure, R., Bühlmann, P., Meier, L., and Meinshausen, N. (2014). High-dimensional inference: confidence intervals, p-values and R-software hdi. To appear in Statistical Science; Preprint, arXiv:1408.4026.
  • [11] Donoho, D. L. (2006). Compressed sensing., IEEE Transactions on Information Theory, 52:1289–1306.
  • [12] Eicker, F. (1967). Limit theorems for regressions with unequal and dependent errors. In, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 59–82.
  • [13] Foygel Barber, R. and Candès, E. (2014). Controlling the false discovery rate via knockoffs. To appear in the Annals of Statistics; Preprint, arXiv:1404.5609.
  • [14] Freedman, D. A. et al. (1981). Bootstrapping regression models., Annals of Statistics, 9:1218–1228.
  • [15] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion)., Annals of Statistics, 19:1–67.
  • [16] Ghosh, M., Reid, N., and Fraser, D. (2010). Ancillary statistics: A review., Statistica Sinica, 20:1309–1332.
  • [17] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 221–233.
  • [18] Jankova, J. and van de Geer, S. (2015). Confidence intervals for high-dimensional inverse covariance estimation., Electronic Journal of Statistics, 9:1205–1229.
  • [19] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression., Journal of Machine Learning Research, 15:2869–2909.
  • [20] Lockhart, R., Taylor, J., Tibshirani, R. J., and Tibshirani, R. (2014). A significance test for the lasso (with discussion)., Annals of Statistics, 42:413–468.
  • [21] Meier, L., Meinshausen, N., and Dezeure, R. (2014)., hdi: High-Dimensional Inference. R package version 0.1-2.
  • [22] Meinshausen, N. (2015). Group-bound: confidence intervals for groups of variables in sparse high-dimensional regression without assumptions on the design. To appear in the Journal of the Royal Statistical Society; Preprint, arXiv:1309.3489.
  • [23] Meinshausen, N. and Bühlmann, P. (2010). Stability selection (with discussion)., Journal of the Royal Statistical Society, Series B, 72:417–473.
  • [24] Meinshausen, N., Meier, L., and Bühlmann, P. (2009). P-values for high-dimensional regression., Journal of the American Statistical Association, 104:1671–1681.
  • [25] Minnier, J., Tian, L., and Cai, T. (2011). A perturbation method for inference on regularized regression estimates., Journal of the American Statistical Association, 106:1371–1382.
  • [26] Ren, Z., Sun, T., Zhang, C.-H., and Zhou, H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical model., Annals of Statistics, 43:991–1026.
  • [27] Taylor, J., Lockhart, R., Tibshirani, R. J., and Tibshirani, R. (2014). Exact post-selection inference for forward stepwise and least angle regression. Preprint, arXiv:1401.3889.
  • [28] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society, Series B, 58:267–288.
  • [29] van de Geer, S. (2015a). $\chi^2$-confidence sets in high-dimensional regression. Preprint, arXiv:1502.07131.
  • [30] van de Geer, S. (2015b). Estimation and testing under sparsity. Lecture Notes École d’Été de Probabilités de Saint-Flour. Springer. To, appear.
  • [31] van de Geer, S., Bühlmann, P., Ritov, Y., and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models., Annals of Statistics, 42:1166–1202.
  • [32] Wasserman, L. (2014). Discussion: “A significance test for the Lasso”., Annals of Statistics, 42:501–508.
  • [33] Wasserman, L. and Roeder, K. (2009). High dimensional variable selection., Annals of Statistics, 37:2178–2201.
  • [34] White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity., Econometrica: Journal of the Econometric Society, 48:817–838.
  • [35] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models., Journal of the Royal Statistical Society, Series B, 76:217–242.