Electronic Journal of Statistics

Data-driven priors and their posterior concentration rates

Ryan Martin and Stephen G. Walker

Full-text: Open access


In high-dimensional problems, choosing a prior distribution such that the corresponding posterior has desirable practical and theoretical properties can be challenging. This begs the question: can the data be used to help choose a prior? In this paper, we develop a general strategy for constructing a data-driven or empirical prior and sufficient conditions for the corresponding posterior distribution to achieve a certain concentration rate. The idea is that the prior should put sufficient mass on parameter values for which the likelihood is large. An interesting byproduct of this data-driven centering is that the asymptotic properties of the posterior are less sensitive to the prior shape which, in turn, allows users to work with priors of computationally convenient forms while maintaining the desired rates. General results on both adaptive and non-adaptive rates based on empirical priors are presented, along with illustrations in density estimation, nonparametric regression, and high-dimensional normal models.

Article information

Electron. J. Statist., Volume 13, Number 2 (2019), 3049-3081.

Received: June 2018
First available in Project Euclid: 20 September 2019

Permanent link to this document

Digital Object Identifier

Primary: 62C12: Empirical decision procedures; empirical Bayes procedures 62E20: Asymptotic distribution theory
Secondary: 62G07: Density estimation 62G08: Nonparametric regression

Adaptation data-dependent prior density estimation empirical Bayes nonparametric regression

Creative Commons Attribution 4.0 International License.


Martin, Ryan; Walker, Stephen G. Data-driven priors and their posterior concentration rates. Electron. J. Statist. 13 (2019), no. 2, 3049--3081. doi:10.1214/19-EJS1600. https://projecteuclid.org/euclid.ejs/1568944885

Export citation


  • [Arbel, Gayraud and Rousseau (2013)] Arbel, J., Gayraud, G. and Rousseau, J. (2013). Bayesian optimal adaptive estimation using a sieve prior., Scand. J. Stat. 40 549–570.
  • [Arias-Castro and Lounici (2014)] Arias-Castro, E. and Lounici, K. (2014). Estimation and variable selection with exponential weights., Electron. J. Stat. 8 328–354.
  • [Armagan, Dunson and Lee (2013)] Armagan, A., Dunson, D. B. and Lee, J. (2013). Generalized double Pareto shrinkage., Statist. Sinica 23 119–143.
  • [Barron (1988)] Barron, A. (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions, Technical Report No. 7, Department of Statistics, University of Illinois, Champaign, IL.
  • [Belitser (2017)] Belitser, E. (2017). On coverage and local radial rates of credible sets., Ann. Statist. 45 1124–1151.
  • [Belitser and Ghosal (2019)] Belitser, E. and Ghosal, S. (2019). Empirical Bayes oracle uncertainty quantification., Ann. Statist., to appear, http://www4.stat.ncsu.edu/~ghoshal/papers/oracle_regression.pdf.
  • [Belitser and Nurushev (2017)] Belitser, E. and Nurushev, N. (2017). Needles and straw in a haystack: robust confidence for possibly sparse sequences. Unpublished manuscript, arXiv:1511.01803.
  • [Berger (1985)] Berger, J. O. (1985)., Statistical Decision Theory and Bayesian Analysis, Second ed. Springer-Verlag, New York.
  • [Bhadra et al. (2017)] Bhadra, A., Datta, J., Polson, N. G. and Willard, B. (2017). The horseshoe+ estimator of ultra-sparse signals., Bayesian Anal. 12 1105–1131.
  • [Bhattacharya et al. (2015)] Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet-Laplace priors for optimal shrinkage., J. Amer. Statist. Assoc. 110 1479–1490.
  • [Carlin and Louis (1996)] Carlin, B. P. and Louis, T. A. (1996)., Bayes and Empirical Bayes Methods for Data Analysis. Monographs on Statistics and Applied Probability 69. Chapman & Hall, London.
  • [Carvalho, Polson and Scott (2010)] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals., Biometrika 97 465–480.
  • [Castillo and van der Vaart (2012)] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: posterior concentration for possibly sparse sequences., Ann. Statist. 40 2069–2101.
  • [Donnet et al. (2018)] Donnet, S., Rivoirard, V., Rousseau, J. and Scricciolo, C. (2018). Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures., Bernoulli 24 231–256.
  • [Donoho et al. (1992)] Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object., J. Roy. Statist. Soc. Ser. B 54 41–81. With discussion and a reply by the authors.
  • [Efron (2010)] Efron, B. (2010)., Large-Scale Inference. Institute of Mathematical Statistics Monographs 1. Cambridge University Press, Cambridge.
  • [Gao and Zhou (2016)] Gao, C. and Zhou, H. H. (2016). Rate exact Bayesian adaptation with modified block priors., Ann. Statist. 44 318–345.
  • [Ghosal, Ghosh and van der Vaart (2000)] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions., Ann. Statist. 28 500–531.
  • [Ghosal and van der Vaart (2001)] Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities., Ann. Statist. 29 1233–1263.
  • [Ghosal and van der Vaart (2007a)] Ghosal, S. and van der Vaart, A. W. (2007a). Posterior convergence rates of Dirichlet mixtures at smooth densities., Ann. Statist. 35 697–723.
  • [Ghosal and van der Vaart (2007b)] Ghosal, S. and van der Vaart, A. (2007b). Convergence rates of posterior distributions for non-i.i.d. observations., Ann. Statist. 35 192–223.
  • [Ghosal and van der Vaart (2017)] Ghosal, S. and van der Vaart, A. (2017)., Fundamentals of Nonparametric Bayesian Inference. Cambridge Series in Statistical and Probabilistic Mathematics 44. Cambridge University Press, Cambridge.
  • [Kruijer, Rousseau and van der Vaart (2010)] Kruijer, W., Rousseau, J. and van der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures., Electron. J. Stat. 4 1225–1257.
  • [Lee, Lee and Lin (2017)] Lee, K., Lee, J. and Lin, L. (2017). Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors., Ann. Statist., to appear, arXiv:1811.06198.
  • [Martin (2017)] Martin, R. (2017). Invited comment on the article by van der Pas, Szabó, and van der Vaart., Bayesian Anal. 12 1254–1258.
  • [Martin (2018)] Martin, R. (2018). Empirical priors and posterior concentration rates for a monotone density., Sankhya A, to appear, arXiv:1706.08567.
  • [Martin, Mess and Walker (2017)] Martin, R., Mess, R. and Walker, S. G. (2017). Empirical Bayes posterior concentration in sparse high-dimensional linear models., Bernoulli 23 1822–1847.
  • [Martin and Ning (2019)] Martin, R. and Ning, B. (2019). Empirical priors and coverage of posterior credible sets in a sparse normal mean model., arXiv:1812.02150.
  • [Martin and Shen (2017)] Martin, R. and Shen, W. (2017). Asymptotically optimal empirical Bayes inference in a piecewise constant sequence model., arXiv:1712.03848.
  • [Martin and Tang (2019)] Martin, R. and Tang, Y. (2019). Empirical priors for prediction in sparse high-dimensional linear regression., arXiv:1903.00961.
  • [Martin and Walker (2014)] Martin, R. and Walker, S. G. (2014). Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector., Electron. J. Stat. 8 2188–2206.
  • [Petrone, Rousseau and Scricciolo (2014)] Petrone, S., Rousseau, J. and Scricciolo, C. (2014). Bayes and empirical Bayes: do they merge?, Biometrika 101 285–302.
  • [Rousseau and Szabo (2017)] Rousseau, J. and Szabo, B. (2017). Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator., Ann. Statist. 45 833–865.
  • [Salomond (2014)] Salomond, J.-B. (2014). Concentration rate and consistency of the posterior distribution for selected priors under monotonicity constraints., Electron. J. Stat. 8 1380–1404.
  • [Scricciolo (2007)] Scricciolo, C. (2007). On rates of convergence for Bayesian density estimation., Scand. J. Statist. 34 626–642.
  • [Scricciolo (2015)] Scricciolo, C. (2015). Bayesian adaptation., J. Statist. Plann. Inference 166 87–101.
  • [Shen and Ghosal (2015)] Shen, W. and Ghosal, S. (2015). Adaptive Bayesian procedures using random series priors., Scand. J. Stat. 42 1194–1213.
  • [Shen and Wasserman (2001)] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions., Ann. Statist. 29 687–714.
  • [Szabó, van der Vaart and van Zanten (2013)] Szabó, B. T., van der Vaart, A. W. and van Zanten, J. H. (2013). Empirical Bayes scaling of Gaussian priors in the white noise model., Electron. J. Stat. 7 991–1018.
  • [van der Pas, Szabó and van der Vaart (2017a)] van der Pas, S., Szabó, B. and van der Vaart, A. (2017a). Uncertainty quantification for the horseshoe (with discussion)., Bayesian Anal. 12 1221–1274. With a rejoinder by the authors.
  • [van der Pas, Szabó and van der Vaart (2017b)] van der Pas, S., Szabó, B. and van der Vaart, A. (2017b). Adaptive posterior contraction rates for the horseshoe., Electron. J. Stat. 11 3196–3225.
  • [van der Vaart and van Zanten (2009)] van der Vaart, A. W. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth., Ann. Statist. 37 2655–2675.
  • [van Erven and Harremoës (2014)] van Erven, T. and Harremoës, P. (2014). Rényi divergence and Kullback-Leibler divergence., IEEE Trans. Inform. Theory 60 3797–3820.
  • [Walker and Hjort (2001)] Walker, S. and Hjort, N. L. (2001). On Bayesian consistency., J. R. Stat. Soc. Ser. B Stat. Methodol. 63 811–821.
  • [Walker, Lijoi and Prünster (2007)] Walker, S. G., Lijoi, A. and Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models., Ann. Statist. 35 738–746.