Electronic Journal of Statistics

Trace class Markov chains for the Normal-Gamma Bayesian shrinkage model

Liyuan Zhang, Kshitij Khare, and Zeren Xing

Full-text: Open access

Abstract

High-dimensional data, where the number of variables exceeds or is comparable to the sample size, is now pervasive in many scientific applications. In recent years, Bayesian shrinkage models have been developed as effective and computationally feasible tools to analyze such data, especially in the context of linear regression. In this paper, we focus on the Normal-Gamma shrinkage model developed by Griffin and Brown [7]. This model subsumes the popular Bayesian lasso model, and a three-block Gibbs sampling algorithm to sample from the resulting intractable posterior distribution has been developed in [7]. We consider an alternative two-block Gibbs sampling algorithm, and rigorously demonstrate its advantage over the three-block sampler by comparing specific spectral properties. In particular, we show that the Markov operator corresponding to the two-block sampler is trace class (and hence Hilbert-Schmidt), whereas the operator corresponding to the three-block sampler is not even Hilbert-Schmidt. The trace class property for the two-block sampler implies geometric convergence for the associated Markov chain, which justifies the use of Markov chain CLT’s to obtain practical error bounds for MCMC based estimates. Additionally, it facilitates theoretical comparisons of the two-block sampler with sandwich algorithms which aim to improve performance by inserting inexpensive extra steps in between the two conditional draws of the two-block sampler.

Note

When this article was first made public, on January 16, 2019, Kshitij Khare funding information was left out of the article. The article was corrected on July 30, 2019.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 166-207.

Dates
Received: March 2017
First available in Project Euclid: 16 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1547607848

Digital Object Identifier
doi:10.1214/18-EJS1491

Mathematical Reviews number (MathSciNet)
MR3899950

Zentralblatt MATH identifier
1407.60098

Subjects
Primary: 60J05: Discrete-time Markov processes on general state spaces 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]
Secondary: 33C10: Bessel and Airy functions, cylinder functions, $_0F_1$

Keywords
Data Augmentation Markov chain Monte Carlo normal-Gamma model Bayesian shrinakge trace class operators

Rights
Creative Commons Attribution 4.0 International License.

Citation

Zhang, Liyuan; Khare, Kshitij; Xing, Zeren. Trace class Markov chains for the Normal-Gamma Bayesian shrinkage model. Electron. J. Statist. 13 (2019), no. 1, 166--207. doi:10.1214/18-EJS1491. https://projecteuclid.org/euclid.ejs/1547607848


Export citation

References

  • [1] Abramowitz, M. and Stegun, I. A. (1965). Handbook of Mathematical Functions:With Formulas, Graphs, and Mathematical Tables (Dover Books on Mathematics), 1 ed., Dover books on mathematics. Dover Publications.
  • [2] Adamczak, R. and Bednorz, W. (2015). Some remarks on MCMC estimation of spectra of integral operators., Bernoulli 21, 2073-2092.
  • [3] Chakraborty, S. and Khare, K. (2018). Consistent estimation of the spectrum of trace class data augmentation algorithms, arxiv.
  • [4] de los Campos, G., Naya, H., Gianola, D., Crossa, J., Legarra, A., Manfredi, E., Weigel, K. and Cotes, J.M. (2009). Predicting quantitative traits with regression models for dense molecular markers and pedigree, Genetics 182, 375-385.
  • [5] DeMiguel, V., Garlappi, L., Nogales, F. J. and Uppal, R. (2009). A generalized approach to portfolio optimization: improving performance by constraining portfolio norms, Management Science 55, 798-812.
  • [6] Fiedler, M. (1971). Bounds for the Determinant of the Sum of Hermitian Matrices, Proceedings of the American Mathematical Society 30, 27-31.
  • [7] Griffin, J. E. and Brown, P. J. (2010). Inference with normal-gamma prior distributions in regression problems., Bayesian Analysis 5, 171-188.
  • [8] Gu, X., Yin, G. and Lee, J. J. (2013). Bayesian two-step Lasso strategy for biomarker selection in personalized medicine development for time-to-event endpoints, Contemporary Clinical Trials 36, 642-650.
  • [9] Hobert, J. P. and Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms, The Annals of Statistics 36, 532-554.
  • [10] Hobert, J. P., Roy, V. and Robert, C. P. (2011). Improving the convergence properties of the data augmentation algorithm with an application to Bayesian mixture modeling, Statistical Science 26, 332-351.
  • [11] Jacquemin, S. J. and Doll, J. C. (2014). Body size and geographic range do not explain long term variation in fish populations: a Bayesian phylogenetic approach to testing assembly processes in stream fish assemblages, PLoS ONE 9, 1-7.
  • [12] Johndrow, J.E., Orenstein, P. and Bhattacharya, A. (2018). Scalable MCMC for Bayes shrinkage priors, arxiv.
  • [13] Jorgens, K. (1982)., Linear Integral Operators, Pitman Books, London.
  • [14] Khare, K. and Hobert, J. P. (2011). A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants., The Annals of Statistics 39, 2585-2606.
  • [15] Kotz, S. and Nadarajah, S. (2004)., Multivariate $t$ Distributions and Their Applications, Cambridge University Press, Cambridge.
  • [16] Laforgia, A. (1991). Bounds for modified Bessel functions., Journal of Computational and Applied Mathematics 34, 263-267.
  • [17] Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation, J. Amer. Statist. Assoc. 94, 1264-1274.
  • [18] Marchev, D. and Hobert, J. P. (2004). Geometric ergodicity of van Dyk and Meng’s algorithm for the multivariate Student’s t model, Journal of the American Statistical Association 99, 228-238.
  • [19] Meng, X. L. and van Dyk, D. A. (1999). Seeking efficient data augmentation schemes via conditional and marginal augmentation, Biometrika 86, 301-320.
  • [20] Mishchenko, Y. and Paninski, L. (2012). A Bayesian compressed-sensing approach for reconstructing neural connectivity from subsampled anatomical data, Journal of Computational Neuroscience 33, 371-388.
  • [21] Pal, S and Khare, K. (2014). Geometric ergodicity for Bayesian shrinkage models, Electronic Journal of Statistics 8, 604-645.
  • [22] Pal, S., Khare, K. and Hobert, J. P. (2016). Trace class Markov chains for Bayesian inference with generalized double Pareto shrinkage priors., Scandinavian Journal of Statistics, doi: 10.1111/sjos.12254.
  • [23] Park, T. and Casella, G. (2008). The Bayesian Lasso, Journal of the American Statistical Association 103, 681-686.
  • [24] Perez, P. and de los Campos, G. (2014). Genome-Wide regression and prediction with the BGLR statistical package, Genetics 198, 483-495.
  • [25] Pong-Wong, R. (2014). Estimation of genomic breeding values using the Horseshoe prior, BMC Proc. 8, Suppl 5.
  • [26] Pong-Wong, R. and Woolliams, J. (2014). Bayes U: A genomic prediction method based on the Horseshoe prior, World Congress of Genetics Applied to Livestock Production 10, Vancouver, Canada.
  • [27] Qin, Q., Hobert, J. and Khare, K. (2018). Estimating the spectral gap of a trace-class Markov operator, arxiv.
  • [28] Ruiz-Antolin, D. and Segura, J. (2016). A new type of sharp bounds for ratios of modified Bessel functions, arXiv:1606.02008.
  • [29] Rajaratnam,B., Sparks,D., Khare,K. and Zhang, L. (2016). Scalable Bayesian shrinkage and uncertainty quantification for high-dimensional regression, arXiv:1509.03697, Cornell University, Library.
  • [30] Tibshirani, R. (1994). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society, Series B 58 267-288.
  • [31] Vats, D., Flegal, J. and Jones, G. (2018). Multivariate output analysis for Markov Chain Monte Carlo, to appear in, Biometrika.
  • [32] Watson, G.N. (1944)., A Treatise on the Theory of Bessel Functions, Cambridge University Press, Cambridge, $2^nd$ edition.
  • [33] Xing, Z., Zhou, M., Castrodad, A., Sapiro, G. and Carin, L. (2012). Dictionary learning for noisy and incomplete hyperspectral images, SIAM Journal of Imaging Sciences 5, 33-56.
  • [34] Yi, N. and Xu, S. (2008). Bayesian LASSO for quantitative trait loci mapping, Genetics 179, 1045-1055.