The Annals of Statistics

On the contraction properties of some high-dimensional quasi-posterior distributions

Yves A. Atchadé

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We study the contraction properties of a quasi-posterior distribution $\check{\Pi}_{n,d}$ obtained by combining a quasi-likelihood function and a sparsity inducing prior distribution on $\mathbb{R}^{d}$, as both $n$ (the sample size), and $d$ (the dimension of the parameter) increase. We derive some general results that highlight a set of sufficient conditions under which $\check{\Pi}_{n,d}$ puts increasingly high probability on sparse subsets of $\mathbb{R}^{d}$, and contracts toward the true value of the parameter. We apply these results to the analysis of logistic regression models, and binary graphical models, in high-dimensional settings. For the logistic regression model, we shows that for well-behaved design matrices, the posterior distribution contracts at the rate $O(\sqrt{s_{\star}\log(d)/n})$, where $s_{\star}$ is the number of nonzero components of the parameter. For the binary graphical model, under some regularity conditions, we show that a quasi-posterior analog of the neighborhood selection of [Ann. Statist. 34 (2006) 1436–1462] contracts in the Frobenius norm at the rate $O(\sqrt{(p+S)\log(p)/n})$, where $p$ is the number of nodes, and $S$ the number of edges of the true graph.

Article information

Ann. Statist., Volume 45, Number 5 (2017), 2248-2273.

Received: September 2015
Revised: September 2016
First available in Project Euclid: 31 October 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference 62Jxx: Linear inference, regression

Quasi-Bayesian inference high-dimensional inference Bayesian asymptotics logistic regression models discrete graphical models


Atchadé, Yves A. On the contraction properties of some high-dimensional quasi-posterior distributions. Ann. Statist. 45 (2017), no. 5, 2248--2273. doi:10.1214/16-AOS1526.

Export citation


  • [1] Alquier, P. and Lounici, K. (2011). PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electron. J. Stat. 5 127–145.
  • [2] Arias-Castro, E. and Lounici, K. (2014). Estimation and variable selection with exponential weights. Electron. J. Stat. 8 328–354.
  • [3] Atchadé, Y. F. (2014). Estimation of high-dimensional partially-observed discrete Markov random fields. Electron. J. Stat. 8 2242–2263.
  • [4] Atchadé, Y. F. (2015). A Moreau-Yosida approximation scheme for high-dimensional posterior and quasi-posterior distributions. Available at arXiv:1505.07072.
  • [5] Atchadé, Y. F. (2015). A scalable quasi-Bayesian framework for Gaussian graphical models. Available at arXiv:1512.07934.
  • [6] Atchadé, Y. F. (2017). Supplement to “On the contraction properties of some high-dimensional quasi-posterior distributions.” DOI:10.1214/16-AOS1526SUPP.
  • [7] Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • [8] Banerjee, S. and Ghosal, S. (2015). Bayesian structure learning in graphical models. J. Multivariate Anal. 136 147–162.
  • [9] Barber, R. F. and Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electron. J. Stat. 9 567–607.
  • [10] Baricz, A. (2008). Mills’ ratio: Monotonicity patterns and functional inequalities. J. Math. Anal. Appl. 340 1362–1370.
  • [11] Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36 192–236.
  • [12] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986–2018.
  • [13] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
  • [14] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Springer, Berlin.
  • [15] Chernozhukov, V. and Hong, H. (2003). An MCMC approach to classical estimation. J. Econometrics 115 293–346.
  • [16] Dalalyan, A. S. and Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Springer, Berlin.
  • [17] Florens, J.-P. and Simoni, A. (2012). Nonparametric estimation of an instrumental regression: A quasi-Bayesian approach based on regularized posterior. J. Econometrics 170 458–475.
  • [18] Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. Springer Series in Statistics. Springer, New York.
  • [19] Höfling, H. and Tibshirani, R. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. J. Mach. Learn. Res. 10 883–906.
  • [20] Kato, K. (2013). Quasi-Bayesian analysis of nonparametric instrumental variables models. Ann. Statist. 41 2359–2390.
  • [21] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Statist. 34 837–877.
  • [22] Li, C. and Jiang, W. (2014). Model selection for likelihood-free Bayesian methods based on moment conditions: Theory and numerical examples. Available at arXiv:1405.6693v1.
  • [23] Li, Y.-H., Scarlett, J., Ravikumar, P. and Cevher, V. (2014). Sparsistency of $\ell_{1}$-regularized M-estimators. Preprint. Available at arXiv:1410.7605v1.
  • [24] Liao, Y. and Jiang, W. (2011). Posterior consistency of nonparametric conditional moment restricted models. Ann. Statist. 39 3003–3031.
  • [25] Marin, J.-M., Pudlo, P., Robert, C. P. and Ryder, R. J. (2012). Approximate Bayesian computational methods. Stat. Comput. 22 1167–1180.
  • [26] McAllester, D. A. (1999). Some pac-Bayesian theorems. Mach. Learn. 37 355–363.
  • [27] Meinshausen, N. and Buhlmann, P. (2006). High-dimensional graphs with the lasso. Ann. Statist. 34 1436–1462.
  • [28] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023–1036.
  • [29] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $m$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • [30] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
  • [31] Schreck, A., Fort, G., Le Corff, S. and Moulines, E. (2013). A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection. Available at arXiv:1312.5658.
  • [32] Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. J. Mach. Learn. Res. 14 3385–3418.
  • [33] Yang, W. and He, X. (2012). Bayesian empirical likelihood for quantile regression. Ann. Statist. 40 1102–1131.
  • [34] Zhang, T. (2006). From $\varepsilon $-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.

Supplemental materials

  • Supplement to “On the contraction properties of some high-dimensional quasi-posterior distributions”. The supplementary material contains the proof of Theorems 4, 9 and 10.