Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

Erin LeDell; Maya Petersen; Mark van der Laan

doi:10.1214/15-EJS1035

2015 Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

Erin LeDell, Maya Petersen, Mark van der Laan

Electron. J. Statist. 9(1): 1583-1607 (2015). DOI: 10.1214/15-EJS1035

Abstract

In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.

Citation

Download Citation

Erin LeDell. Maya Petersen. Mark van der Laan. "Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates." Electron. J. Statist. 9 (1) 1583 - 1607, 2015. https://doi.org/10.1214/15-EJS1035

Information

Received: 1 December 2014; Published: 2015

First available in Project Euclid: 24 July 2015

zbMATH: 1327.62298

MathSciNet: MR3376118

Digital Object Identifier: 10.1214/15-EJS1035

Subjects:

Primary: 62G05 , 62G15

Secondary: 62G20

Keywords: AUC , Binary classification , confidence intervals , cross-validation , influence curve , influence function , machine learning , Model selection , ROC , variance estimation