Electronic Journal of Statistics

Fast Bayesian variable selection for high dimensional linear models: Marginal solo spike and slab priors

Su Chen and Stephen G. Walker

Full-text: Open access

Abstract

This paper presents a method for fast Bayesian variable selection in the normal linear regression model with high dimensional data. A novel approach is adopted in which an explicit posterior probability for including a covariate is obtained. The method is sequential but not order dependent, one deals with each covariate one by one, and a spike and slab prior is only assigned to the coefficient under investigation. We adopt the well-known spike and slab Gaussian priors with a sample size dependent variance, which achieves strong selection consistency for marginal posterior probabilities even when the number of covariates grows almost exponentially with sample size. Numerical illustrations are presented where it is shown that the new approach provides essentially equivalent results to the standard spike and slab priors, i.e. the same marginal posterior probabilities of the coefficients being nonzero, which are estimated via Gibbs sampling. Hence, we obtain the same results via the direct calculation of $p$ probabilities, compared to a stochastic search over a space of $2^{p}$ elements. Our procedure only requires $p$ probabilities to be calculated, which can be done exactly, hence parallel computation when $p$ is large is feasible.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 284-309.

Dates
Received: July 2018
First available in Project Euclid: 5 February 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1549335678

Digital Object Identifier
doi:10.1214/18-EJS1529

Mathematical Reviews number (MathSciNet)
MR3910035

Zentralblatt MATH identifier
07021706

Keywords
Bayesian variable selection spike and slab priors high dimensional linear model strong selection consistency parallel computation

Rights
Creative Commons Attribution 4.0 International License.

Citation

Chen, Su; Walker, Stephen G. Fast Bayesian variable selection for high dimensional linear models: Marginal solo spike and slab priors. Electron. J. Statist. 13 (2019), no. 1, 284--309. doi:10.1214/18-EJS1529. https://projecteuclid.org/euclid.ejs/1549335678


Export citation

References

  • [1] Barbieri, M. M., Berger, J. O. et al. (2004). Optimal predictive model selection., Annals of Statistics 32 870–897.
  • [2] Bondell, H. D. and Reich, B. J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions., Journal of the American Statistical Association 107 1610–1624.
  • [3] Brown, P. J., Vannucci, M. and Fearn, T. (1998). Multivariate Bayesian variable selection and prediction., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60 627–641.
  • [4] Candes, E., Tao, T. et al. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$., Annals of Statistics 35 2313–2351.
  • [5] Casella, G. and Moreno, E. (2006). Objective Bayesian variable selection., Journal of the American Statistical Association 101 157–167.
  • [6] Castillo, I., Schmidt-Hieber, J., van der Vaart, A. et al. (2015). Bayesian linear regression with sparse priors., Annals of Statistics 43 1986–2018.
  • [7] Clyde, M. and George, E. I. (2004). Model uncertainty., Statistical Science 81–94.
  • [8] Fan, J., Feng, Y. and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models., Journal of the American Statistical Association 106 544–557.
  • [9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96 1348–1360.
  • [10] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space., Statistica Sinica 20 101.
  • [11] George, E. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection., Biometrika 87 731–747.
  • [12] George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling., Journal of the American Statistical Association 88 881–889.
  • [13] George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection., Statistica sinica 339–373.
  • [14] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies., Annals of Statistics 730–773.
  • [15] Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings., Journal of the American Statistical Association 107 649–660.
  • [16] Liang, F., Song, Q. and Yu, K. (2013). Bayesian subset modeling for high-dimensional generalized linear models., Journal of the American Statistical Association 108 589–606.
  • [17] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression., Journal of the American Statistical Association 83 1023–1032.
  • [18] Narisetty, N. N., He, X. et al. (2014). Bayesian variable selection with shrinking and diffusing priors., Annals of Statistics 42 789–817.
  • [19] O’Hara, R. B., Sillanpää, M. J. et al. (2009). A review of Bayesian variable selection methods: what, how and which., Bayesian Analysis 4 85–117.
  • [20] Ročková, V. and George, E. I. (2018). The spike-and-slab lasso., Journal of the American Statistical Association 113 431–444.
  • [21] Song, R., Yi, F. and Zou, H. (2014). On varying-coefficient independence screening for high-dimensional varying-coefficient models., Statistica Sinica 24 1735.
  • [22] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological) 267–288.
  • [23] Zou, H. (2006). The adaptive lasso and its oracle properties., Journal of the American Statistical Association 101 1418–1429.