## The Annals of Statistics

### Bayesian variable selection with shrinking and diffusing priors

#### Abstract

We consider a Bayesian approach to variable selection in the presence of high dimensional covariates based on a hierarchical model that places prior distributions on the regression coefficients as well as on the model space. We adopt the well-known spike and slab Gaussian priors with a distinct feature, that is, the prior variances depend on the sample size through which appropriate shrinkage can be achieved. We show the strong selection consistency of the proposed method in the sense that the posterior probability of the true model converges to one even when the number of covariates grows nearly exponentially with the sample size. This is arguably the strongest selection consistency result that has been available in the Bayesian variable selection literature; yet the proposed method can be carried out through posterior sampling with a simple Gibbs sampler. Furthermore, we argue that the proposed method is asymptotically similar to model selection with the $L_{0}$ penalty. We also demonstrate through empirical work the fine performance of the proposed approach relative to some state of the art alternatives.

#### Article information

Source
Ann. Statist., Volume 42, Number 2 (2014), 789-817.

Dates
First available in Project Euclid: 20 May 2014

https://projecteuclid.org/euclid.aos/1400592178

Digital Object Identifier
doi:10.1214/14-AOS1207

Mathematical Reviews number (MathSciNet)
MR3210987

Zentralblatt MATH identifier
1302.62158

#### Citation

Narisetty, Naveen Naidu; He, Xuming. Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 (2014), no. 2, 789--817. doi:10.1214/14-AOS1207. https://projecteuclid.org/euclid.aos/1400592178

#### References

• Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
• Bondell, H. D. and Reich, B. J. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64 115–123, 322–323.
• Bondell, H. D. and Reich, B. J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions. J. Amer. Statist. Assoc. 107 1610–1624.
• Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• Dey, T., Ishwaran, H. and Rao, J. S. (2008). An in-depth look at highest posterior model selection. Econometric Theory 24 377–403.
• Dicker, L., Huang, B. and Lin, X. (2013). Variable selection and estimation with the seamless-$L_0$ penalty. Statist. Sinica 23 929–962.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
• Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
• Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
• George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
• George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
• Hsu, D., Kakade, S. M. and Zhang, T. (2012). A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17 6.
• Huang, J. and Xie, H. (2007). Asymptotic oracle properties of SCAD-penalized least squares estimators. In Asymptotics: Particles, Processes and Inverse Problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series 55 149–166. IMS, Beachwood, OH.
• Ishwaran, H., Kogalur, U. B. and Rao, J. S. (2010). spikeslab: Prediction and variable selection using spike and slab regression. The R Journal 2 68–73.
• Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
• Ishwaran, H. and Rao, J. S. (2011). Consistency of spike and slab regression. Statist. Probab. Lett. 81 1920–1928.
• James, G. M., Radchenko, P. and Lv, J. (2009). DASSO: Connections between the Dantzig selector and lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 127–142.
• Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487–1511.
• Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649–660.
• Kim, Y., Kwon, S. and Choi, H. (2012). Consistent model selection criteria on high dimensions. J. Mach. Learn. Res. 13 1037–1057.
• Lan, H., Chen, M., Flowers, J. B., Yandell, B. S., Stapleton, D. S., Mata, C. M., Mui, E. T., Flowers, M. T., Schueler, K. L., Manly, K. F., Williams, R. W., Kendziorski, K. and Attie, A. D. (2006). Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genetics 2 e6.
• Liang, F., Song, Q. and Yu, K. (2013). Bayesian subset modeling for high-dimensional generalized linear models. J. Amer. Statist. Assoc. 108 589–606.
• Liu, Y. and Wu, Y. (2007). Variable selection via a combination of the $L_0$ and $L_1$ penalties. J. Comput. Graph. Statist. 16 782–798.
• Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023–1036.
• Moreno, E., Girón, F. J. and Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Ann. Statist. 38 1937–1952.
• Narisetty, N. N. and He, X. (2014). Supplement to “Bayesian variable selection with shrinking and diffusing priors.” DOI:10.1214/14-AOS1207SUPP.
• Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
• Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. J. Amer. Statist. Assoc. 107 223–232.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
• Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• Yang, Y. and He, X. (2012). Bayesian empirical likelihood for quantile regression. Ann. Statist. 40 1102–1131.
• Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Amer. Statist. Assoc. 100 1215–1225.
• Zhang, D., Lin, Y. and Zhang, M. (2009). Penalized orthogonal-components regression for large $p$ small $n$ data. Electron. J. Stat. 3 781–796.
• Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.