## Electronic Journal of Statistics

### A note on the use of empirical AUC for evaluating probabilistic forecasts

Simon Byrne

#### Abstract

Scoring functions are used to evaluate and compare partially probabilistic forecasts. We investigate the use of rank-sum functions such as empirical Area Under the Curve (AUC), a widely used measure of classification performance, as a scoring function for the prediction of probabilities of a set of binary outcomes. It is shown that the AUC is not generally a proper scoring function, that is, under certain circumstances it is possible to improve on the expected AUC by modifying the quoted probabilities from their true values. However with some restrictions, or with certain modifications, it can be made proper.

#### Article information

Source
Electron. J. Statist., Volume 10, Number 1 (2016), 380-393.

Dates
First available in Project Euclid: 17 February 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1455715967

Digital Object Identifier
doi:10.1214/16-EJS1109

Mathematical Reviews number (MathSciNet)
MR3466187

Zentralblatt MATH identifier
06549025

Subjects
Primary: 62C99: None of the above, but in this section

#### Citation

Byrne, Simon. A note on the use of empirical AUC for evaluating probabilistic forecasts. Electron. J. Statist. 10 (2016), no. 1, 380--393. doi:10.1214/16-EJS1109. https://projecteuclid.org/euclid.ejs/1455715967

#### References

• [1] Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S. and Roth, D. (2005). Generalization bounds for the area under the ROC curve., Journal of Machine Learning Research 6 393–425.
• [2] Byrne, S. (2016). Supplement to “A note on the use of empirical AUC for evaluating probabilistic forecasts”., doi:10.1214/16-EJS1109SUPP
• [3] Clémençon, S., Lugosi, G. and Vayatis, N. (2008). Ranking and empirical minimization of $U$-statistics., Annals of Statistics 36 844–874. doi:10.1214/009052607000000910.
• [4] Dawid, A. P., Lauritzen, S. and Parry, M. (2012). Proper local scoring rules on discrete sample spaces., Annals of Statistics 40 593–608. doi:10.1214/12-AOS972.
• [5] Flach, P., Hernandez-Orallo, J. and Ferri, C. (2011). A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance. In, Proceedings of the 28th International Conference on Machine Learning (L. Getoor and T. Scheffer, eds.) 657–664. ACM, New York, NY, USA.
• [6] Gneiting, T. (2011). Making and evaluating point forecasts., Journal of the American Statistical Association 106 746–762. doi:10.1198/jasa.2011.r10138.
• [7] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation., Journal of the American Statistical Association 102 359–378. doi:10.1198/016214506000001437.
• [8] Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve., Machine Learning 77 103–123. doi:10.1007/s10994-009-5119-5
• [9] Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve., Radiology 143 29–36. doi:10.1148/radiology.143.1.7063747