Bayesian Analysis

Hierarchical Shrinkage Priors for Regression Models

Jim Griffin and Phil Brown

Full-text: Open access

Abstract

In some linear models, such as those with interactions, it is natural to include the relationship between the regression coefficients in the analysis. In this paper, we consider how robust hierarchical continuous prior distributions can be used to express dependence between the size but not the sign of the regression coefficients. For example, to include ideas of heredity in the analysis of linear models with interactions. We develop a simple method for controlling the shrinkage of regression effects to zero at different levels of the hierarchy by considering the behaviour of the continuous prior at zero. Applications to linear models with interactions and generalized additive models are used as illustrations.

Article information

Source
Bayesian Anal., Volume 12, Number 1 (2017), 135-159.

Dates
First available in Project Euclid: 19 January 2016

Permanent link to this document
https://projecteuclid.org/euclid.ba/1453211963

Digital Object Identifier
doi:10.1214/15-BA990

Mathematical Reviews number (MathSciNet)
MR3597570

Zentralblatt MATH identifier
1384.62225

Keywords
Bayesian regularization interactions structured priors strong and weak heredity generalized additive models normal-gamma prior normal-gamma-gamma prior generalized beta mixture prior

Citation

Griffin, Jim; Brown, Phil. Hierarchical Shrinkage Priors for Regression Models. Bayesian Anal. 12 (2017), no. 1, 135--159. doi:10.1214/15-BA990. https://projecteuclid.org/euclid.ba/1453211963


Export citation

References

  • Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions. Dover.
  • Armagan, A., Dunson, D., and Clyde, M. (2011). “Generalized Beta Mixtures of Gaussians.” In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems 24, 523–531.
  • Armagan, A., Dunson, D. B., and Lee, J. (2013). “Generalized double Pareto shrinkage.” Statistica Sinica, 23: 119–143.
  • Atchadé, Y. F. and Rosenthal, J. S. (2005). “On adaptive Markov chain Monte Carlo algorithms.” Bernoulli, 11: 815–828.
  • Bhattacharya, A. and Dunson, D. B. (2011). “Sparse Bayesian infinite factor models.” Biometrika, 98: 291–306.
  • Bien, J., Taylor, J., and Tibshirani, R. (2013). “A lasso for hierarchical interactions.” Annals of Statistics, 41: 1111–1141.
  • Breiman, L. and Friedman, J. (1985). “Estimating optimal transformations for multiple regression and correlation.” Journal of the American Statistical Association, 80: 580–598.
  • Caron, F. and Doucet, A. (2008). “Sparse Bayesian nonparametric regression.” In: McCallum, A. and Roweis, S. (eds.), Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), 88–95. Omnipress.
  • Carvalho, C., Polson, N., and Scott, J. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97: 465–480.
  • Chipman, H. (1996). “Bayesian variable selection approach with related predictors.” Canadian Journal of Statistics, 24: 17–36.
  • Chipman, H., Hamada, M., and Wu, C. F. J. (1997). “A Bayesian variable selection approach for analyzing designed experiments with complex aliasing.” Technometrics, 39: 372–381.
  • Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression. Wiley.
  • Ein-Dor, P. and Feldmesser, J. (1987). “Attributes of the performance of Central Processing Units: A relative performance prediction model.” Communications of the Association for Computer Machinery, 30: 308–317.
  • George, E. I. and McCulloch, R. E. (1993). “Variable selection via Gibbs sampling.” Journal of the American Statistical Association, 88: 881–889.
  • Geyer, C. J. (1992). “Practical Markov chain Monte Carlo.” Statistical Science, 7: 473–511.
  • Griffin, J. E. and Brown, P. J. (2010). “Inference with Normal-Gamma prior distributions in regression problems.” Bayesian Analysis, 5: 171–188.
  • Griffin, J. E. and Brown, P. J. (2011). “Bayesian hyper-lassos with non-convex penalisation.” Australian and New Zealand Journal of Statistics, 53: 423–442.
  • Griffin, J. E. and Brown, P. J. (2012). “Structuring shrinkage: Some correlated priors for regression.” Biometrika, 99: 481–487.
  • Griffin, J. E. and Brown, P. J. (2013). “Some priors for sparse regression modelling.” Bayesian Analysis, 8: 691–702.
  • Griffin, J. and Brown, P. (2016). “Supplementary Material of Hierarchical Shrinkage Priors for Regression Models” Bayesian Analysis.
  • Gustafson, P. (2000). “Bayesian Regression Modeling with Interactions and Smooth Effects.” Journal of the American Statistical Association, 95: 795–806.
  • Hamada, M. and Wu, C. F. J. (1992). “Analysis of designed experiments with complex aliasing.” Journal of Quality Technology, 24: 130–137.
  • Hans, C. (2009). “Bayesian lasso regression.” Biometrika, 96: 835–845.
  • Hans, C. (2011). “Elastic net regression modeling with the orthant normal prior.” Journal of the American Statistical Association, 106: 1383–1393.
  • Hastie, T. J. and Tibshirani, R. J. (1993). Generalized Additive Models. Chapman and Hall.
  • Jacob, L., Obozinski, G., and Vert, J.-P. (2009). “Group Lasso with Overlaps and Graph Lasso.” In: Bottou, L. and Littman, M. (eds.), Proceedings of the 26th International Conference on Machine Learning, 433–440. Montreal: Omnipress.
  • Jakeman, E. and Pusey, P. N. (1978). “Significance of K-distributions in scattering experiments.” Physical Review Letters, 40: 546–550.
  • Kalli, M. and Griffin, J. E. (2014). “Time-varying sparsity in dynamic regression models.” Journal of Econometrics, 178: 779–793.
  • Kohn, R., Smith, M., and Chan, D. (2001). “Nonparametric regression using linear combinations of basis functions.” Statistics and Computing, 11: 313–322.
  • Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). “Penalized Regression, Standard Errors, and Bayesian Lassos.” Bayesian Analysis, 5: 369–412.
  • Lai, R. C. S., Huang, H.-C., and Lee, T. C. M. (2012). “Fixed and random effects selection in nonparametric additive mixed models.” Electronic Journal of Statistics, 6: 810–842.
  • Lee, A., Caron, F., Doucet, A., and Holmes, C. (2012). “Bayesian sparsity-path-analysis of genetic association using generalised t priors.” Statistical Applications in Genetics and Molecular Biology, 11 (2): Art 5.
  • Li, F. and Zhang, N. R. (2010). “Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics.” Journal of the American Statistical Association, 105: 1202–1214.
  • Miasojedow, B., Moulines, E., and Vihola, M. (2013). “An adaptive parallel tempering algorithm.” Journal of Computational and Graphical Statistics, 22: 649–664.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). “Bayesian variable selection in linear regression (with discussion).” Journal of the American Statistical Association, 83: 1023–1036.
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103: 672–680.
  • Polson, N. G. and Scott, J. G. (2011). “Shrink globally, act locally: Sparse Bayesian regularization and prediction.” In: Bernardo J. M., M. J., Bayarri, Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 9, 501–538. Oxford: Clarendon Press.
  • Polson, N. G. and Scott, J. G. (2012). “Local shrinkage rules, Lévy processes and Regularized Regression.” Journal of the Royal Statistical Society, Series B, 74: 287–311.
  • Polson, N. G., Scott, J. G., and Windle, J. (2013). “The Bayesian Bridge.” Journal of the Royal Statistical Society, Series B, 76: 713–33.
  • Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory. M.I.T. Press.
  • Raman, S., Fuchs, T., Wild, P., Dahl, E., and Roth, V. (2009). “The Bayesian Group-Lasso for analyzing contingency tables.” In: Bottou, L. and Littman, M. (eds.), Proceedings of the 26th International Conference on Machine Learning, 881–888. Montreal: Omnipress.
  • Roberts, G. O. and Rosenthal, J. S. (2007). “Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms.” Journal of Applied Probability, 44: 458–475.
  • Roberts, G. O. and Rosenthal, J. S. (2009). “Examples of Adaptive MCMC.” Journal of Computational and Graphical Statistics, 18: 349–367.
  • Rockova, V. and Lesaffre, E. (2014). “Incorporating grouping information in Bayesian variable selection with applications in genomics.” Bayesian Analysis, 9: 221–258.
  • Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E., and Yang, N. (1989). “Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate: Radical prostatectomy treated patients.” Journal of Urology, 16: 1076–1083.
  • Stingo, F. C., Chen, Y. A., Tadesse, M. G., and Vannucci, M. (2011). “Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes.” Annals of Applied Statistics, 5: 1978–2002.
  • Yi, N., Shriner, D., Banerjee, S., Mehta, T., Pomp, D., and Yandell, B. S. (2007). “An efficient Bayesian model selection approach for interpreting quantitative trait loci models with many effects.” Genetics, 176: 1865–1877.
  • Yuan, M., Joseph, V. R., and Lin, Y. (2007). “An efficient variable selection approach for analyzing designed experiments.” Technometrics, 49: 430–439.
  • Yuan, M., Joseph, V. R., and Zou, H. (2009). “Structured variable selection and estimation.” The Annals of Applied Statistics, 3: 1738–1757.
  • Yuan, M. and Lin, Y. (2006). “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society B, 68: 49–67.

Supplemental materials