Bayesian Analysis

Incorporating Grouping Information in Bayesian Variable Selection with Applications in Genomics

Veronika Rockova and Emmanuel Lesaffre

Full-text: Open access


In many applications it is of interest to determine a limited number of important explanatory factors (representing groups of potentially overlapping predictors) rather than original predictor variables. The often imposed requirement that the clustered predictors should enter the model simultaneously may be limiting as not all the variables within a group need to be associated with the outcome. Within-group sparsity is often desirable as well. Here we propose a Bayesian variable selection method, which uses the grouping information as a means of introducing more equal competition to enter the model within the groups rather than as a source of strict regularization constraints. This is achieved within the context of Bayesian LASSO (least absolute shrinkage and selection operator) by allowing each regression coefficient to be penalized differentially and by considering an additional regression layer to relate individual penalty parameters to a group identification matrix. The proposed hierarchical model therefore enables inference simultaneously on two levels: (1) the regression layer for the continuous outcome in relation to the predictors and (2) the regression layer for the penalty parameters in relation to the grouping information. Both situations with overlapping and non-overlapping groups are applicable. The method does not assume within-group homogeneity across the regression coefficients, which is implicit in many structured penalized likelihood approaches. The smoothness here is enforced at the penalty level rather than within the regression coefficients. To enhance the potential of the proposed method we develop two rapid computational procedures based on the expectation maximization (EM) algorithm, which offer substantial time savings in applications where the high-dimensionality renders Markov chain Monte Carlo (MCMC) approaches less practical. We demonstrate the usefulness of our method in predicting time to death in glioblastoma patients using pathways of genes.

Article information

Bayesian Anal., Volume 9, Number 1 (2014), 221-258.

First available in Project Euclid: 24 February 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian shrinkage estimation EM algorithm Bayesian LASSO Minorization-maximization


Rockova, Veronika; Lesaffre, Emmanuel. Incorporating Grouping Information in Bayesian Variable Selection with Applications in Genomics. Bayesian Anal. 9 (2014), no. 1, 221--258. doi:10.1214/13-BA846.

Export citation


  • Abramowitz, M. and Stegun, I. (1972). Handbook of Mathematical Functions. Dover Publications, 1 edition.
  • Armagan, A., Dunson, D., and Lee, J. (2012). “Generalized Double Pareto Shrinkage.” Technical report, Duke University.
  • Carvalho, C. and Polson, N. (2010). “The Horseshoe Estimator for Sparse Signals.” Biometrika, 97(476): 465–480.
  • Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q., and West, M. (2008). “High-Dimensional Sparse Factor Modelling: Applications in Gene Expression Genomics.” Journal of the American Statistical Association, 103(484): 1438–1456.
  • Chen, M.-H. and Ibrahim, J. G. (2003). “Conjugate priors for generalized linear models.” Statistica Sinica, 13(2): 461–476.
  • Choe, G., Horvath, S., Cloughesy, T., Crosby, K., Seligson, D., Palotie, A., Inge, L., Smith, B., Sawyers, C., and Mischel, P. (2003). “Analysis of the phosphatidylinositol 3’-kinase signaling pathway in glioblastoma patients in vivo.” Cancer Research, 63(2): 2742–2746.
  • Dempster, A., Laird, N., and Rubin, D. (1977). “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society, Series B, 39(1): 1–38.
  • Dickinson, R., Dallol, A., Bieche, I., Krex, D., Morton, D., Maher, E., and Latif, F. (2004). “Epigenetic inactivation of SLIT3 and SLIT1 genes in human cancers.” British Journal of Cancer, 13: 2071–2078.
  • Fan, J. and Li, R. (2001). “Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association, 96: 1348–1360.
  • Figueiredo, M. A. (2003). “Adaptive Sparseness for Supervised Learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 25: 1150–1159.
  • Gelfand, A. and Vounatsou, P. (2003). “Proper Multivariate Conditional Autoregressive Models for Spatial Data Analysis.” Biostatistics, 4: 11–15.
  • George, E. and Foster, D. (1997). “Calibration and Empirical Bayes Variable Selection.” Biometrika, 87: 731–747.
  • Gingras, M., Roussel, E., Bruner, J., Branch, C., and Moser, R. (1995). “Comparison of cell adhesion molecule expression between glioblastoma multiforme and autologous normal brain tissue.” Journal of Neuroimmunology, 57: 143–153.
  • Golub, G. and van Loan, C. (1996). Matrix Computations. The John Hopkins University Press, 1 edition.
  • Gradshteyn, I. and Ryzhik, E. (2000). Table of Integrals Series and Products. Academic Press, 6 edition.
  • Griffin, J. E. and Brown, P. J. (2012). “Bayesian Hyper-LASSOS with Non-convex Penalization.” Australian & New Zealand Journal of Statistics, 53: 423–442.
  • Horvath, S., Zhang, B., Carlson, M., Lu, K., Zhu, S., Felciano, R., Laurance, M., Zhao, W., Qi, S., Chen, Z., Lee, Y., Scheck, A., Liau, L., Wu, H., Geschwind, D., Febbo, P., Kornblum, H., Cloughesy, T., Nelson, S., and Mischel, P. (2006). “Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Molecular Target.” Proceedings of the National Academy of Sciences of the United States of America, 103: 17402–17407.
  • Irizarry, R., Hobbs, B., Collin, F., Beazer-Barclay, Y., Antonellis, K., Scherf, U., and Speed, T. (2003). “Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data.” Biostatistics, 4: 249–264.
  • Ishwaran, H. and Rao, S. (2005). “Spike and slab variable selection: frequentist and Bayesian strategies.” The Annals of Statistiscs, 33: 730–773.
  • Jacob, L., Obozinski, G., and Vert, J. (2009). “Group LASSO with Overlap and Graph LASSO.” Proceedings of the 26th International Conference on Machine Learning, 55: 1–8.
  • Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A. (2002). “The KEGG Databases at GenomeNet.” Nucleic Acids Research, 30: 42–46.
  • Kiiveri, H. (2003). “A Bayesian Approach to Variable Selection When the Number of Variables is Very Large.” Institute of Mathematical Statistics Lecture Notes-Monograph Series, 40: 127–143.
  • Kyung, M., Gilly, J., Ghosh, M., and Casella, G. (2010). “Penalized Regression, Standard Errors, and Bayesian Lassos.” Bayesian Analysis, 5: 369–412.
  • Leeb, H. and Potscher, B. M. (2005). “Model Selection and Inference: Facts and Fiction.” Econometric Theory, 21: 21–59.
  • Li, C. and Li, H. (2008). “Network-constrained Regularization and Variable Selection for Analysis of Genomic Data.” Biometrics, 24(9): 1175–1182.
  • Li, F. and Zhang, N. R. (2010). “Bayesian Variable Selection in Structured High-dimensional Covariate Spaces with Applications in Genomics.” Journal of the American Statistical Association, 105(3): 1978–2002.
  • Liang, F., Paulo, R., Molina, G., Clyde, M., and Berger, J. (2008). “Mixtures of g-priors for Bayesian Variable Selection.” Journal of the American Statistical Association, 410–423.
  • McDonald, J., Dunmire, V., Taylor, R., E. Sawaya, Bruner, J., Fuller, G., Aldape, K., and Zhang, W. (2005). “Attenuated Expression of DFFB is a Hallmark of Oligodendrogliomas with 1p-Allelic Loss.” Molecular Cancer, 4: 1476–1498.
  • McLachlan, G. J. and Krishnan, T. (1996). The EM Algorithm and Extensions. Wiley-Interscience, 2 edition.
  • Nakada, M., Kita, D., Watanabe, T., Hayashi, Y., Teng, L., Pyko, I., and Hamada, J. (2011). “Aberrant Signaling Pathways in Glioma.” Cancers, 3: 3242–3278.
  • Nikuseva-Martic, T., Beros, V., Pecina-Slaus, N., Pecina, H. I., and Bulic-Jakus, F. (2010). “Genetic changes of CDH1, APC, and CTNNB1 found in human brain tumors.” Pathology - Research and Practice, 203(11): 779–787.
  • Pan, W., Benhuai, X., and Xiaotong, S. (2010). “Incorporating Predictor Network in Penalized Regression with Application to Microarray Data.” Biometrics, 66(2): 474–484.
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103(482): 681–686.
  • Paulus, W. and Tonn, J. (1995). “Interactions of glioma cells and extracellular matrix.” Journal of Neuro-Oncology, 24: 87–91.
  • Peng, H. and Fan, J. (2004). “Nonconcave penalized likelihood with a diverging number of parameters.” The Annals of Statistics, 32(3): 928–961.
  • Schneider, S., Ludwig, T., Tatenhorst, L., Braune, S., Oberleithner, H., Senner, V., and Paulus, W. (2004). “Glioblastoma cells release factors that disrupt blood-brain barrier features.” Acta Neuropathologica, 107: 272–276.
  • Sciumè, G., Soriani, A., Piccoli, M., Frati, L., Santoni, A., and Bernardini, G. (2010). “CX3CL1 axis negatively controls glioma cell invasion and is modulated by transforming growth factor-beta1.” Neuro-Oncology, 111(2): 3626–3634.
  • Stingo, F., Chen, Y., Tadesse, M., and Vannucci, M. (2011). “Incorporating Biological Information into Linear Models: A Bayesian Approach to the Selection of Pathways and Genes.” The Annals of Applied Statistics, 5: 1202–1214.
  • Stingo, F., Chen, Y., Vannucci, M., Barrier, M., and Mirkes, P. (2010). “A Bayesian Graphical Modeling Approach to MicroRNA Regulatory Network Inference.” Annals of Applied Statistics, 4: 2024–2048.
  • Stingo, F. and Vannucci, M. (2011). “Variable Selection for Discriminant Analysis with Markov Random Field Priors for the Analysis of Microarray Data.” Bioinformatics, 27(4): 495–501.
  • Tibshirani, R. (1994). “Regression Shrinkage and Selection Via the Lasso.” Journal of the Royal Statistical Society, Series B, 58: 267–288.
  • Ueda, N. and Nakano, R. (1998). “Deterministic annealing EM algorithm.” Neural Networks, 11: 271–282.
  • Yuan, M. and Lin, Y. (2006). “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society, Series B, 68: 49–67.
  • Zellner, A. (1986). “On assessing prior distributions and Bayesian regression analysis with g-prior distributions.” In Bayesian Inference and Decision Techniques.
  • Zou, H. (2006). “The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association, 101(476): 1418–1429.
  • Zou, H. and Hastie, T. (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67: 301–320.
  • Zou, H. and Li, R. (2008). “One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.” The Annals of Statistics, 36(4): 1509–1533.