Open Access
2023 Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination
Tilmann Gneiting, Johannes Resin
Author Affiliations +
Electron. J. Statist. 17(2): 3226-3286 (2023). DOI: 10.1214/23-EJS2180


A common principle in model diagnostics and forecast evaluation is that fitted or predicted distributions ought to be reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary responses, auto-calibration is the universal concept of reliability. For real-valued outcomes, a general theory of calibration has been elusive, despite a recent surge of interest in distributional regression and machine learning. We develop a framework rooted in probability theory, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. In a nutshell, a prediction is conditionally T-calibrated if it can be taken at face value in terms of an identifiable functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration, discrimination, and uncertainty. In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination that nests and reinterprets the classical R2 in least squares regression and its natural analog R1 in quantile regression, yet applies to T-regression in general.

Funding Statement

Our research has been funded by the Klaus Tschira Foundation. Johannes Resin gratefully acknowledges support from the German Research Foundation (DFG) through grant number 502572912.


The authors would like to thank Sebastian Arnold, Fadoua Balabdaoui-Mohr, Jonas Brehmer, Frank Diebold, Timo Dimitriadis, Uwe Ehret, Andreas Fink, Tobias Fissler, Rafael Frongillo, Norbert Henze, Alexander Henzi, Alexander I. Jordan, Kristof Kraus, Fabian Krüger, Sebastian Lerch, Michael Maier-Gerber, Anja Mühlemann, Jim Pitman, Marc-Oliver Pohle, Roopesh Ranjan, Benedikt Schulz, Ville Satopää, Daniel Wolffram and Johanna F. Ziegel, as well as anonymous reviewers, for helpful comments and discussion.


Download Citation

Tilmann Gneiting. Johannes Resin. "Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination." Electron. J. Statist. 17 (2) 3226 - 3286, 2023.


Received: 1 October 2022; Published: 2023
First available in Project Euclid: 20 November 2023

arXiv: 2108.03210
Digital Object Identifier: 10.1214/23-EJS2180

Primary: 62G99 , 62J20

Keywords: Calibration test , canonical loss , consistent scoring function , model diagnostics , nonparametric isotonic regression , prequential principle , score decomposition , skill score

Vol.17 • No. 2 • 2023
Back to Top