Bayesian Analysis

Bayesian Variable Selection and Estimation for Group Lasso

Xiaofan Xu and Malay Ghosh

Full-text: Open access


The paper revisits the Bayesian group lasso and uses spike and slab priors for group variable selection. In the process, the connection of our model with penalized regression is demonstrated, and the role of posterior median for thresholding is pointed out. We show that the posterior median estimator has the oracle property for group variable selection and estimation under orthogonal designs, while the group lasso has suboptimal asymptotic estimation rate when variable selection consistency is achieved. Next we consider bi-level selection problem and propose the Bayesian sparse group selection again with spike and slab priors to select variables both at the group level and also within a group. We demonstrate via simulation that the posterior median estimator of our spike and slab models has excellent performance for both variable selection and estimation.

Article information

Bayesian Anal., Volume 10, Number 4 (2015), 909-936.

First available in Project Euclid: 4 February 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

group variable selection spike and slab prior Gibbs sampling median thresholding


Xu, Xiaofan; Ghosh, Malay. Bayesian Variable Selection and Estimation for Group Lasso. Bayesian Anal. 10 (2015), no. 4, 909--936. doi:10.1214/14-BA929.

Export citation


  • Abramovich, F., Sapatinas, T., and Silverman, B. W. (1998). “Wavelet thresholding via a Bayesian approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(4): 725–749.
  • Barbieri, M. M. and Berger, J. O. (2004). “Optimal predictive model selection.” The Annals of Statistics, 32(3): 870–897.
  • Bonato, V., Baladandayuthapani, V., Broom, B. M., Sulman, E. P., Aldape, K. D., and Do, K.-A. (2011). “Bayesian ensemble methods for survival prediction in gene expression data.” Bioinformatics, 27(3): 359–367.
  • Brown, P. J., Vannucci, M., and Fearn, T. (2002). “Bayes model averaging with selection of regressors.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3): 519–536.
  • Casella, G. (2001). “Empirical Bayes Gibbs sampling.” Biostatistics (Oxford, England), 2(4): 485–500.
  • Castillo, I. and Van Der Vaart, A. (2012). “Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences.” The Annals of Statistics, 40(4): 2069–2101.
  • Chatterjee, A. and Lahiri, S. (2011). “Bootstrapping Lasso Estimators.” Journal of the American Statistical Association, 106(494): 608–625.
  • Chen, Z. and Dunson, D. (2003). “Random effects selection in linear mixed models.” Biometrics, 59(4): 762–769.
  • Clyde, M. A. (1999). “Bayesian model averaging and model search strategies.” In: Bayesian statistics, 6 (Alcoceber, 1998), 157–185. New York: Oxford Univ. Press.
  • Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). “Least angle regression.” The Annals of Statistics, 32(2): 407–499.
  • George, E. and McCulloch, R. (1997). “Approaches for Bayesian variable selection.” Statistica Sinica, 7: 339–374.
  • Geweke, J. F. (1994). “Variable selection and model comparison in regression.” Working Paper 539, Federal Reserve Bank of Minneapolis.
  • Griffin, J. E. and Brown, P. J. (2012). “Structuring shrinkage: some correlated priors for regression.” Biometrika, 99(2): 481–487.
  • — (2013). “Some priors for sparse regression modelling.” Bayesian Analysis, 8(3): 691–702.
  • Hobert, J. P. and Geyer, C. J. (1998). “Geometric ergodicity of Gibbs and block Gibbs samplers for a hierarchical random effects model.” Journal of Multivariate Analysis, 67(2): 414–430.
  • Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). “Bayesian model averaging: a tutorial.” Statistical Science, 14(4): 382–401.
  • Huang, J., Breheny, P., and Ma, S. (2012). “A Selective Review of Group Selection in High-Dimensional Models.” Statistical Science, 27(4): 481–499.
  • Jenatton, R., Mairal, J., Obozinski, G., and Bach, F. (2011). “Proximal methods for hierarchical sparse coding.” Journal of Machine Learning Research, 12: 2297–2334.
  • Johnstone, I. M. and Silverman, B. W. (2004). “Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences.” The Annals of Statistics, 32(4): 1594–1649.
  • Knight, K. and Fu, W. (2000). “Asymptotics for lasso-type estimators.” Annals of Statistics, 1356–1378.
  • Kuo, L. and Mallick, B. (1998). “Variable selection for regression models.” In: Bayesian Analysis, Sankhyā: The Indian Journal of Statistics, Series B (1960-2002), 60(1): 65–81.
  • Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). “Penalized regression, standard errors, and Bayesian lassos.” Bayesian Analysis, 5(2): 369–411.
  • Leng, C., Lin, Y., and Wahba, G. (2004). “A note on the LASSO and related procedures in model selection.” Statistica Sinica. Technical report.
  • Lindley, D. V. (1957). “A statistical paradox.” Biometrika, 44(1/2): 187–192.
  • Lykou, A. and Ntzoufras, I. (2013). “On Bayesian lasso variable selection and the specification of the shrinkage parameter.” Statistics and Computing, 23(3): 361–390.
  • Mairal, J., Jenatton, R., Obozinski, G., and Bach, F. (2010). “Network flow algorithms for structured sparsity.” arXiv:1008.5209 [cs, stat].
  • Mitchell, T. J. and Beauchamp, J. J. (1988). “Bayesian variable selection in linear regression.” Journal of the American Statistical Association, 83(404): 1023–1032.
  • Nardi, Y. and Rinaldo, A. (2008). “On the asymptotic properties of the group lasso estimator for linear models.” Electronic Journal of Statistics, 2: 605–633. Zentralblatt MATH identifier: 06165707.
  • Park, T. and Casella, G. (2008). “The Bayesian lasso.” Journal of the American Statistical Association, 103(482): 681–686.
  • Raman, S., Fuchs, T. J., Wild, P. J., Dahl, E., and Roth, V. (2009). “The Bayesian group-lasso for analyzing contingency tables.” In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, 881–888. New York, NY, USA: ACM.
  • Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2012). “A sparse-group lasso.” Journal of Computational and Graphical Statistics.
  • Soussen, C., Idier, J., Brie, D., and Duan, J. (2011). “From Bernoulli–Gaussian deconvolution to sparse signal restoration.” IEEE Transactions on Signal Processing, 59(10): 4572–4584.
  • Stingo, F. C., Chen, Y. A., Tadesse, M. G., and Vannucci, M. (2011). “Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes.” The Annals of Applied Statistics, 5(3): 1978–2002. Zentralblatt MATH identifier: 1228.62150.
  • Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2004). “Sparsity and smoothness via the fused lasso.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1): 91–108.
  • Wang, H. and Leng, C. (2008). “A note on adaptive group lasso.” Computational Statistics & Data Analysis, 52(12): 5277–5286.
  • Yuan, M. and Lin, Y. (2005). “Efficient empirical Bayes variable selection and estimation in linear models.” Journal of the American Statistical Association, 100(472): 1215–1225.
  • — (2006). “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 68(1): pp. 49–67.
  • Zellner, A. (1986). “On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions.” In: Bayesian inference and decision techniques, volume 6 of Studies in Bayesian Econometrics and Statistics, 233–243. Amsterdam: North-Holland.
  • Zhang, L., Baladandayuthapani, V., Mallick, B. K., Manyam, G. C., Thompson, P. A., Bondy, M. L., and Do, K.-A. (2014). “Bayesian hierarchical structured variable selection methods with application to molecular inversion probe studies in breast cancer.” Journal of the Royal Statistical Society: Series C (Applied Statistics).
  • Zhao, Z. and Sarkar, S. (2012). “On credible intervals for selected parameters under the zero-inflated mixture prior in high dimensional inference.” Unpublished manuscript.
  • Zhou, M., Chen, H., Ren, L., Sapiro, G., Carin, L., and Paisley, J. W. (2009). “Non-parametric Bayesian dictionary learning for sparse image representations.” In Advances in Neural Information Processing Systems, 2295–2303.
  • Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.