Electronic Journal of Statistics

Asymptotic minimum scoring rule prediction

Federica Giummolè and Valentina Mameli

Full-text: Open access


Most of the methods nowadays employed in forecast problems are based on scoring rules. There is a divergence function associated to each scoring rule, that can be used as a measure of discrepancy between probability distributions. This approach is commonly used in the literature for comparing two competing predictive distributions on the basis of their relative expected divergence from the true distribution.

In this paper we focus on the use of scoring rules as a tool for finding predictive distributions for an unknown of interest. The proposed predictive distributions are asymptotic modifications of the estimative solutions, obtained by minimizing the expected divergence related to a general scoring rule.

The asymptotic properties of such predictive distributions are strictly related to the geometry induced by the considered divergence on a regular parametric model. In particular, the existence of a global optimal predictive distribution is guaranteed for invariant divergences, whose local behaviour is similar to well known $\alpha $-divergences.

We show that a wide class of divergences obtained from weighted scoring rules share invariance properties with $\alpha $-divergences. For weighted scoring rules it is thus possible to obtain a global solution to the prediction problem. Unfortunately, the divergences associated to many widely used scoring rules are not invariant. Still for these cases we provide a locally optimal predictive distribution, within a specified parametric model.

Article information

Electron. J. Statist., Volume 12, Number 2 (2018), 2401-2429.

Received: June 2017
First available in Project Euclid: 25 July 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M20: Prediction [See also 60G25]; filtering [See also 60G35, 93E10, 93E11]
Secondary: 60G25: Prediction theory [See also 62M20]

$\alpha $-connection Fisher metric Kullback-Leibler divergence monotone and regular divergence predictive density scoring rule weighted scoring rule

Creative Commons Attribution 4.0 International License.


Giummolè, Federica; Mameli, Valentina. Asymptotic minimum scoring rule prediction. Electron. J. Statist. 12 (2018), no. 2, 2401--2429. doi:10.1214/18-EJS1454. https://projecteuclid.org/euclid.ejs/1532484334

Export citation


  • [1] Aitchison, J. (1975). Goodness of prediction fit., Biometrika 62 547–554.
  • [2] Aitchison, J. and Dunsmore, I.R. (1975)., Statistical prediction analysis. Cambridge University Press, Cambridge.
  • [3] Amari, S. (1985)., Differential Geometric Methods in Statistics. Lecture Notes in Statistics, 28. New York: Springer-Verlag.
  • [4] Amari, S. (2010). Divergence function, information monotonicity and information geometry., Bulletin of the Polish Academy of Sciences: Technical Sciences 58 183–195.
  • [5] Barndorff-Nielsen, O.E. and Cox, D.R. (1996). Prediction and asymptotics., Bernoulli 2 319–340.
  • [6] Bjørnstad, J.F. (1990). Predictive likelihood: A review., Statistical Sciences 5 242–265.
  • [7] Brier, G.W. (1950). Verification of forecasts expressed in terms of probability., Monthly Weather Review 78 1–3.
  • [8] Ĉencov, N.N. (1982)., Statistical decision rules and optimal inference. Translations of Mathematical Monographs 53. AMS, Providence, RI.
  • [9] Corcuera, J.M. and Giummolè, F. (1998). A characterization of monotone and regular divergences., Annals of the Institute of Statistical Mathematics 50 433–450.
  • [10] Corcuera, J.M. and Giummolè, F. (1999). On the relationship between $\alpha $-connections and the asymptotic properties of predictive distributions., Bernoulli 5 163–176.
  • [11] Corcuera, J.M. and Giummolè, F. (2000). First order optimal predictive densities. In Marriott, P., Salmon, M., Applications of differential geometry to econometrics, Cambridge University Press.
  • [12] Csiszár, I. (1967). Information-type measure of difference of probability distributions and indirect observations., Studia Scientiarum Mathematicarum Hungarica 2 299–318.
  • [13] Dawid, A.P. (1998). Coherent measures of discrepancy, uncertainty and dependence, with applications to Bayesian predictive experimental design., Technical Report 139, Department of Statistical Science, University College London. http://www.ucl.ac.uk/Stats/research/pdfs/139b.zip
  • [14] Dawid, A.P. (2007). The geometry of proper scoring rules., Annals of the Institute of Statistical Mathematics 59 77–93.
  • [15] Dawid, A.P. and Musio, M. (2014). Theory and applications of proper scoring rules., Metron 72 169–183.
  • [16] Dawid, A.P., Musio, M. and Ventura, L. (2016). Minimum scoring rule inference., Scandinavian Journal of Statistics 43 123–138.
  • [17] Eguchi, S. (1992). Geometry of minimum contrast., Hiroshima Mathematical Journal 22 631–647.
  • [18] Eguchi, S. (2006). Information geometry and statistical pattern recognition., Sugaku Exposition. AMS, Providence, RI.
  • [19] Forbes, P.G.M. (2012). Compatible weighted proper scoring rules., Biometrika 99 989–994.
  • [20] Fonseca, G., Giummolè, F. and Vidoni, P. (2014). Calibrating predictive distributions., Journal of Statistical Computation and Simulation 84 373–383.
  • [21] Giummolè, F., Ventura, L. (2006). Robust prediction limits based on M-estimators., Statistics and Probability Letters 76 1735–1740.
  • [22] Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation., Journal of the American Statistical Association 102 359–378.
  • [23] Gneiting,T. and Katzfuss, M. (2014). Probabilistic Forecasting., Annual Review of Statistics and Its Application 1 125–151.
  • [24] Good, I.J. (1952). Rational decisions., Journal of the Royal Statistical Society, Series B 14 107–114.
  • [25] Hall, P., Peng, L. and Tajvidi, N. (1999). On prediction intervals based on predictive likelihood or bootstrap methods., Biometrika 86 871–880.
  • [26] Harris, I.R. (1989). Predictive fit for natural exponential families., Biometrika 76 675–684.
  • [27] Holzmann, H. and Klar, B. (2017). Focusing on regions of interest in forecast evaluation., The Annals of Applied Statistics 11(4), 2404–2431.
  • [28] Kass, S. and Vos, P.W. (1997)., Geometrical Foundations of Asymptotic Inference, Wiley Series in Probability and Statistics. New York: John Wiley & Sons, Inc.
  • [29] Komaki, F. (1996). On asymptotic properties of predictive distributions., Biometrika 83 299–313.
  • [30] Jose, V.R.R., Nau, R.F. and Winkler, R.L. (2009). Scoring rules, generalized entropy, and utility maximization., Operations Research 56 1146–1157.
  • [31] Jose, V.R.R. (2008)., The Verification of Probabilistic Forecasts in Decision and Risk Analysis. Phd Thesis, Department of Business Administration, Duke University.
  • [32] Lawless, J.F. and Fredette, M. (2005). Frequentist prediction intervals and predictive distributions., Biometrika 92 529–542.
  • [33] Machete, R. (2013). Contrasting probabilistic scoring rules., Journal of Statistical Planning and Inference 143 1781–1790.
  • [34] Mameli, V. and Ventura, L. (2015). Higher-order asymptotics for scoring rules., Journal of Statistical Planning and Inference 165 13–26.
  • [35] Mameli, V., Musio, M. and Ventura, L. (2018). Bootstrap adjustments of signed scoring rule root statistics., Communications in Statistics – Simulation and Computation 47 1204–1215.
  • [36] Mendenez, M.L., Morales, D., Pardo, L. and Salicrù, M. (1997). $(h,\phi )$-entropy differential metric., Applications of Mathematics 42 81–98.
  • [37] Murray, G.D. (1977). A note on the estimation of probability density functions., Biometrika 64 150–152.
  • [38] Murray, M.K. and Rice, J.W. (1993)., Differential Geometry and Statistics, Monographs on Statistics and Applied Probability, 48. London: Chapman & Hall.
  • [39] Pardo, L. (2006)., Statistical inference based on divergence measure. Florida: Boca Raton, Taylor & Francis.
  • [40] Savage, L.J. (1971). Elicitation of Personal Probabilities and Expectations., Journal of the American Statistical Association 66 783–801.
  • [41] Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics., Journal of Statistical Physics 52 479–487.
  • [42] Vidoni, P. (1995). A simple predictive density based on the $p^*$-formula., Biometrika 82 855–863.
  • [43] Vidoni, P. (2009). Improved prediction intervals and distribution functions., Scandinavian Journal of Statistics 36 735–748.