Electronic Journal of Statistics

Two simple examples for understanding posterior p-values whose distributions are far from uniform

Andrew Gelman

Full-text: Open access


Posterior predictive $p$-values do not in general have uniform distributions under the null hypothesis (except in the special case of ancillary test variables) but instead tend to have distributions more concentrated near 0.5. From different perspectives, such nonuniform distributions have been portrayed as desirable (as reflecting an ability of vague prior distributions to nonetheless yield accurate posterior predictions) or undesirable (as making it more difficult to reject a false model). We explore this tension through two simple normal-distribution examples. In one example, we argue that the low power of the posterior predictive check is desirable from a statistical perspective; in the other, the posterior predictive check seems inappropriate. Our conclusion is that the relevance of the $p$-value depends on the applied context, a point which (ironically) can be seen even in these two toy examples.

Article information

Electron. J. Statist., Volume 7 (2013), 2595-2602.

Received: February 2013
First available in Project Euclid: 22 October 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference 62C10: Bayesian problems; characterization of Bayes procedures 62F03: Hypothesis testing

Bayesian inference model checking posterior predictive check p-value u-value


Gelman, Andrew. Two simple examples for understanding posterior p-values whose distributions are far from uniform. Electron. J. Statist. 7 (2013), 2595--2602. doi:10.1214/13-EJS854. https://projecteuclid.org/euclid.ejs/1382448225

Export citation


  • Bayarri, M. J. and Berger, J. (2000). P-values for composite null models., Journal of the American Statistical Association 95, 1127–1142.
  • Bayarri, M. J., and Castellanos, M. E. (2007). Bayesian checking of the second levels of hierarchical models (with discussion)., Statistical Science 22, 322–367.
  • Gelman, A., Goegebeur, Y., Tuerlinckx, F., and Van Mechelen, I. (2000). Diagnostic checks for discrete-data regression models using posterior predictive simulations., Applied Statistics 49, 247–268.
  • Gelman, A. (2003). A Bayesian formulation of exploratory data analysis and goodness-of-fit testing., International Statistical Review 71, 369–382.
  • Gelman, A. (2007). Discussion of ‘Bayesian checking of the second levels of hierarchical models,’ by M. J. Bayarri and M. E. Castellanos., Statistical Science 22, 349–352.
  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003)., Bayesian Data Analysis, second edition. London: CRC Press.
  • Gelman, A., Meng, X. L., and Stern, H. S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion)., Statistica Sinica 6, 733–807.
  • Lindley, D. V. (1957). A statistical paradox., Biometrika 44, 187–192.
  • Meng, X. L. (1994). Posterior predictive $p$-values., Annals of Statistics 22, 1142–1160.
  • Robins, J. M., Vaart, A., and Ventura, V. (2000). Asymptotic distribution of p values in composite null models., Journal of the American Statistical Association 95, 1143–1156.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician., Annals of Statistics 12, 1151–1172.