Bayesian Analysis

Selection of Tuning Parameters, Solution Paths and Standard Errors for Bayesian Lassos

Vivekananda Roy and Sounak Chakraborty

Full-text: Open access

Abstract

Penalized regression methods such as the lasso and elastic net (EN) have become popular for simultaneous variable selection and coefficient estimation. Implementation of these methods require selection of the penalty parameters. We propose an empirical Bayes (EB) methodology for selecting these tuning parameters as well as computation of the regularization path plots. The EB method does not suffer from the “double shrinkage problem” of frequentist EN. Also it avoids the difficulty of constructing an appropriate prior on the penalty parameters. The EB methodology is implemented by efficient importance sampling method based on multiple Gibbs sampler chains. Since the Markov chains underlying the Gibbs sampler are proved to be geometrically ergodic, Markov chain central limit theorem can be used to provide asymptotically valid confidence band for profiles of EN coefficients. The practical effectiveness of our method is illustrated by several simulation examples and two real life case studies. Although this article considers lasso and EN for brevity, the proposed EB method is general and can be used to select shrinkage parameters in other regularization methods.

Article information

Source
Bayesian Anal., Volume 12, Number 3 (2017), 753-778.

Dates
First available in Project Euclid: 7 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.ba/1473276258

Digital Object Identifier
doi:10.1214/16-BA1025

Mathematical Reviews number (MathSciNet)
MR3655875

Zentralblatt MATH identifier
1384.62102

Subjects
Primary: 62F15: Bayesian inference 62J07: Ridge regression; shrinkage estimators
Secondary: 60J05: Discrete-time Markov processes on general state spaces

Keywords
Bayesian lasso elastic net empirical Bayes geometric ergodicity importance sampling Markov chain Monte Carlo shrinkage standard errors

Rights
Creative Commons Attribution 4.0 International License.

Citation

Roy, Vivekananda; Chakraborty, Sounak. Selection of Tuning Parameters, Solution Paths and Standard Errors for Bayesian Lassos. Bayesian Anal. 12 (2017), no. 3, 753--778. doi:10.1214/16-BA1025. https://projecteuclid.org/euclid.ba/1473276258


Export citation

References

  • Andrews, D. F. and Mallows, C. F. (1974). “Scale mixtures of normal distributions.” Journal of the Royal Statistical Society, Series B, 36: 99–102.
  • Asmussen, S. and Glynn, P. W. (2011). “A new proof of convergence of MCMC via the ergodic theorem.” Statistics and Probability Letters, 81: 1482–1485.
  • Bondell, H. and Reich, B. (2008). “Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with Oscar.” Biometrics, 64: 115–123.
  • Buta, E. and Doss, H. (2011). “Computational approaches for empirical Bayes methods and Bayesian sensitivity analysis.” The Annals of Statistics, 39: 2658–2685.
  • Carvalho, C., Polson, N., and Scott, J. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97: 465–480.
  • Doss, H. (2010). “Estimation of large families of Bayes factors from Markov chain output.” Statistica Sinica, 20: 537–560.
  • Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). “Least angle regression.” The Annals of Statistics, 32: 407–499.
  • Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihood and its oracle property.” Journal of the American Statistical Association, 96: 1348–1360.
  • Flegal, J. M. and Jones, G. L. (2010). “Batch means and spectral variance estimators in Markov chain Monte Carlo.” The Annals of Statistics, 38: 1034–1070.
  • Geyer, C. J. (1994). “Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo.” Technical report 568, School of Statistics, University of Minnesota.
  • Geyer, C. J. (1996). Markov Chain Monte Carlo in Practice, chapter Estimation and optimization of functions, 241–258. Boca Raton, FL: Chapman and Hall/CRC Press.
  • Hoerl, A. E. and Kennard, R. W. (1970). “Ridge regression: Biased estimation for nonorthogonal problems.” Technometrics, 12: 55–67.
  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer, New York, USA.
  • Johnson, V. E. and Rossell, D. (2012). “Bayesian model selection in high-dimensional settings.” Journal of the American Statistical Association, 107: 649–660.
  • Jones, G. L. and Hobert, J. P. (2001). “Honest exploration of intractable probability distributions via Markov chain Monte Carlo.” Statistical Science, 16: 312–34.
  • Kass, R. E. and Raftery, A. E. (1995). “Bayes factors.” Journal of the American Statistical Association, 90: 773–795.
  • Khare, K. and Hobert, J. P. (2013). “Geometric ergodicity of Bayesian lasso.” Electronic Journal of Statistics, 7: 2150–2163.
  • Knight, K. and Fu, W. (2000). “Asymptotics for lasso-type estimators.” The Annals of Statistics, 28: 1356–1378.
  • Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). “Penalized regression, standard errors, and Bayesian lassos.” Bayesian Analysis, 5: 369–412.
  • Lee, K. H., Chakraborty, S., and Sun, J. (2015). “Survival prediction and variable selection with simultaneous shrinkage and grouping priors.” Statistical Analysis and Data Mining, 8: 114–127.
  • Li, Q. and Lin, N. (2010). “The Bayesian elastic net.” Bayesian Analysis, 5: 151–170.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. A., and Berger, J. O. (2008). “Mixtures of $g$-priors for Bayesian variable selection.” Journal of the American Statistical Association, 103: 410–423.
  • Liu, F., Chakraborty, S., Li, F., Liu, Y., and Lozano, A. C. (2014). “Bayesian regularization via graph Laplacian.” Bayesian Analysis, 9: 449–474.
  • Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. London: Springer Verlag.
  • Newton, M. and Raftery, A. (1994). “Approximate Bayesian inference with the weighted likelihood bootstrap (with discussion).” Journal of the Royal Statistical Society, Series B, 56: 3–48.
  • Park, T. and Casella, G. (2008). “The Bayesian lasso.” Journal of the American Statistical Association, 103: 681–686.
  • Polson, N. and Scott, J. (2010). “Shrink globally, act locally: sparse Bayesian regularization and prediction.” Bayesian Statistics, 9: 501–538.
  • Roberts, G. O. and Rosenthal, J. S. (1997). “Geometric ergodicity and hybrid Markov chains.” Electronic Communications in Probability, 2: 13–25.
  • Rosenthal, J. S. (1995). “Minorization conditions and convergence rates for Markov chain Monte Carlo.” Journal of the American Statistical Association, 90: 558–566.
  • Roy, V. (2014). “Efficient estimation of the link function parameter in a robust Bayesian binary regression model.” Computational Statistics and Data Analysis, 73: 87–102.
  • Roy, V. and Chakraborty, S. (2016). “Supplementary material for “Selection of tuning parameters, solution paths and standard errors for Bayesian lassos”.” Bayesian Analysis.
  • Roy, V., Evangelou, E., and Zhu, Z. (2016). “Efficient estimation and prediction for the Bayesian binary spatial model with flexible link functions.” Biometrics, 72: 289–298.
  • Roy, V. and Hobert, J. P. (2007). “Convergence rates and asymptotic standard errors for MCMC algorithms for Bayesian probit regression.” Journal of the Royal Statistical Society, Series B, 69: 607–623.
  • Roy, V., Tan, A., and Flegal, J. (2015). “Estimating standard errors for importance sampling estimators with multiple Markov chains.” Technical report, Iowa State University.
  • Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society, Series B, 58: 267–288.
  • Wolpert, R. L. and Schmidler, S. C. (2012). “$\alpha$-stable limit laws for harmonic mean estimators of marginal likelihoods.” Statistica Sinica, 22: 1233–1251.
  • Yuan, M. and Lin, Y. (2006). “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society, Series B, 68: 49–67.
  • Zou, H. (2006). “The adaptive lasso and its oracle properties.” Journal of the American Statistical Association, 101: 1418–1429.
  • Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.” Journal of the Royal Statistical Society, Series B, 67: 301–320.

Supplemental materials

  • Supplementary Material: Supplementary Material for “Selection of Tuning Parameters, Solution Paths and Standard Errors for Bayesian Lassos”. The online supplementary materials contain the proofs of lemmas. Also a summary of the steps involved in the estimation of the tuning parameters and the solution paths is given in the supplementary materials.