• Bernoulli
  • Volume 23, Number 3 (2017), 1822-1847.

Empirical Bayes posterior concentration in sparse high-dimensional linear models

Ryan Martin, Raymond Mess, and Stephen G. Walker

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We propose a new empirical Bayes approach for inference in the $p\gg n$ normal linear model. The novelty is the use of data in the prior in two ways, for centering and regularization. Under suitable sparsity assumptions, we establish a variety of concentration rate results for the empirical Bayes posterior distribution, relevant for both estimation and model selection. Computation is straightforward and fast, and simulation results demonstrate the strong finite-sample performance of the empirical Bayes model selection procedure.

Article information

Bernoulli, Volume 23, Number 3 (2017), 1822-1847.

Received: September 2015
Revised: December 2015
First available in Project Euclid: 17 March 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

data-dependent prior fractional likelihood minimax regression variable selection


Martin, Ryan; Mess, Raymond; Walker, Stephen G. Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23 (2017), no. 3, 1822--1847. doi:10.3150/15-BEJ797.

Export citation


  • [1] Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression. Electron. J. Stat. 4 932–949.
  • [2] Arias-Castro, E. and Lounici, K. (2014). Estimation and variable selection with exponential weights. Electron. J. Stat. 8 328–354.
  • [3] Barbieri, M.M. and Berger, J.O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
  • [4] Barron, A.R. and Cover, T.M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034–1054.
  • [5] Bondell, H.D. and Reich, B.J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions. J. Amer. Statist. Assoc. 107 1610–1624.
  • [6] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • [7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [8] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986–2018.
  • [9] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
  • [10] Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • [11] Clyde, M. and George, E.I. (2004). Model uncertainty. Statist. Sci. 19 81–94.
  • [12] Dalalyan, A.S. and Tsybakov, A.B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds, and sparsity. Mach. Learn. 72 39–61.
  • [13] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [14] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • [15] Gao, C., van der Vaart, A.W. and Zhou, H.H. (2015). A general framework for Bayes structured linear models. Unpublished manuscript. Available at arXiv:1506.02174.
  • [16] George, E.I. and McCullogh, R.E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • [17] Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [18] Grünwald, P. and van Ommen, T. (2014). Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Unpublished manuscript. Available at arXiv:1412.3730.
  • [19] Heaton, M.J. and Scott, J.G. (2010). Bayesian Computation and the Linear Model. In Frontiers of Statistical Decision Making and Bayesian Analysis (M.-H. Cheh, D. Dey, P. Müller, D. Sun and K. Ye, eds.) 527–545. New York: Springer.
  • [20] Ishwaran, H. and Rao, J.S. (2005). Spike and slab gene selection for multigroup microarray data. J. Amer. Statist. Assoc. 100 764–780.
  • [21] Ishwaran, H. and Rao, J.S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
  • [22] James, G.M. and Radchenko, P. (2009). A generalized Dantzig selector with shrinkage tuning. Biometrika 96 323–337.
  • [23] James, G.M., Radchenko, P. and Lv, J. (2009). DASSO: Connections between the Dantzig selector and lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 127–142.
  • [24] Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487–1511.
  • [25] Jiang, W. and Tanner, M.A. (2008). Gibbs posterior for variable selection in high-dimensional classification and data mining. Ann. Statist. 36 2207–2231.
  • [26] Johnson, V.E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649–660.
  • [27] Martin, R. and Walker, S.G. (2014). Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector. Electron. J. Stat. 8 2188–2206.
  • [28] Narisetty, N.N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
  • [29] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • [30] Reid, S., Tibshirani, R. and Friedman, J. (2014). A study of error variance estimation in lasso regression. Unpublished manuscript. Available at arXiv:1311.5274.
  • [31] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [32] Rigollet, P. and Tsybakov, A.B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
  • [33] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687–714.
  • [34] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [35] Walker, S. and Hjort, N.L. (2001). On Bayesian consistency. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 811–821.
  • [36] Walker, S.G., Lijoi, A. and Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models. Ann. Statist. 35 738–746.
  • [37] Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447 661–678.
  • [38] Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In Bayesian Inference and Decision Techniques. Stud. Bayesian Econometrics Statist. 6 233–243. North-Holland, Amsterdam.
  • [39] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [40] Zhang, T. (2006). From $\varepsilon $-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.
  • [41] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • [42] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.