A common principle in model diagnostics and forecast evaluation is that fitted or predicted distributions ought to be reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary responses, auto-calibration is the universal concept of reliability. For real-valued outcomes, a general theory of calibration has been elusive, despite a recent surge of interest in distributional regression and machine learning. We develop a framework rooted in probability theory, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. In a nutshell, a prediction is conditionally T-calibrated if it can be taken at face value in terms of an identifiable functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration, discrimination, and uncertainty. In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination that nests and reinterprets the classical in least squares regression and its natural analog in quantile regression, yet applies to T-regression in general.
Our research has been funded by the Klaus Tschira Foundation. Johannes Resin gratefully acknowledges support from the German Research Foundation (DFG) through grant number 502572912.
The authors would like to thank Sebastian Arnold, Fadoua Balabdaoui-Mohr, Jonas Brehmer, Frank Diebold, Timo Dimitriadis, Uwe Ehret, Andreas Fink, Tobias Fissler, Rafael Frongillo, Norbert Henze, Alexander Henzi, Alexander I. Jordan, Kristof Kraus, Fabian Krüger, Sebastian Lerch, Michael Maier-Gerber, Anja Mühlemann, Jim Pitman, Marc-Oliver Pohle, Roopesh Ranjan, Benedikt Schulz, Ville Satopää, Daniel Wolffram and Johanna F. Ziegel, as well as anonymous reviewers, for helpful comments and discussion.
"Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination." Electron. J. Statist. 17 (2) 3226 - 3286, 2023. https://doi.org/10.1214/23-EJS2180