Electronic Journal of Statistics

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

Erin LeDell, Maya Petersen, and Mark van der Laan

Full-text: Open access


In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.

Article information

Electron. J. Statist., Volume 9, Number 1 (2015), 1583-1607.

Received: December 2014
First available in Project Euclid: 24 July 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G15: Tolerance and confidence regions 62G05: Estimation
Secondary: 62G20: Asymptotic properties

AUC binary classification confidence intervals cross-validation influence curve influence function machine learning model selection ROC variance estimation


LeDell, Erin; Petersen, Maya; van der Laan, Mark. Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. Electron. J. Statist. 9 (2015), no. 1, 1583--1607. doi:10.1214/15-EJS1035. https://projecteuclid.org/euclid.ejs/1437742107

Export citation


  • [1] Ling, C., Huang, J., and Zhang, H. (2003). AUC: a statistically consistent and more discriminating measure than accuracy., Proceedings of IJCAI 2003.
  • [2] Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms., Pattern Recognition 30, 1145–1159.
  • [3] Geisser, S. (1975). The predictive sample reuse method with applications., Amer. Statist. Assoc. 70, 320–328.
  • [4] Kleiner, A., Talwalkar, A., Sarkar, P., and Jordan, M. (2013). A scalable bootstrap for massive data., Journal of the Royal Statistical Society, Series B.
  • [5] Sing, T., Sander, O., Beerenwinkel, N., and Lengauer, T. (2005). ROCR: Visualizing classifier performance in R., Bioinformatics 21, 20, 3940–3941.
  • [6] Venables, W. N. and Ripley, B. D. (2002)., Modern Applied Statistics with S, Fourth ed. Springer, New York.
  • [7] Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction., Technometrics 16, 125–127.
  • [8] Bezanson, J., Karpinski, S., Shah, V. B., and Edelman, A. (2012). Julia: A fast dynamic language for technical computing., CoRR abs/1209.5145. http://arxiv.org/abs/1209.5145.
  • [9] Bickel, P. J., Götze, F., and van Zwet, W. R. (1997). Resampling fewer than $n$ observations: gains, losses, and remedies for losses., Statist. Sinica 7, 1, 1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995).
  • [10] Bickel, P. J., Klaassen, C. A. J., Ritov, Y., and Wellner, J. A. (1993)., Efficient and adaptive estimation for semiparametric models. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD.
  • [11] Efron, B. (1979). Bootstrap methods: another look at the jackknife., Ann. Statist. 7, 1, 1–26.
  • [12] Efron, B. and Tibshirani, R. J. (1993)., An introduction to the bootstrap. Monographs on Statistics and Applied Probability, Vol. 57. Chapman and Hall, New York.
  • [13] Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent., Journal of Statistical Software 33, 1, 1–22. http://www.jstatsoft.org/v33/i01/.
  • [14] Gill, R. D. (1989). Non- and semi-parametric maximum likelihood estimators and the von Mises method. I., Scand. J. Statist. 16, 2, 97–128. With a discussion by J. A. Wellner and J. Præstgaard and a reply by the author.
  • [15] Kornblith, S. (2014)., GLMNet.jl: Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet. Commit version 0526df8455, https:// github.com/simonster/GLMNet.jl.
  • [16] LeDell, E., Petersen, M., and van der Laan, M. (2013)., cvAUC: Cross-Validated Area Under the ROC Curve Confidence Intervals. R package version 1.0-0, http://CRAN.R-project.org/package=cvAUC.
  • [17] Lin, D. (2014)., A set of functions to support the development of machine learning algorithms. v0.4.2, https://github.com/JuliaStats/MLBase.jl.
  • [18] Lin, D. and White, J. M. (2014)., A Julia package for probability distributions and associated functions. v0.5.4, https://github.com/ JuliaStats/Distributions.jl.
  • [19] Politis, D. N., Romano, J. P., and Wolf, M. (1999)., Subsampling. Springer Series in Statistics. Springer-Verlag, New York. http://dx.doi.org/ 10.1007/978-1-4612-1554-7.
  • [20] Shao, J. (1993). Linear model selection by cross-validation., J. Amer. Statist. Assoc. 88, 422, 486–494.
  • [21] Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions., J. Roy. Statist. Soc. Ser. B 36, 111–147. With discussion by G. A. Barnard, A. C. Atkinson, L. K. Chan, A. P. Dawid, F. Downton, J. Dickey, A. G. Baker, O. Barndorff-Nielsen, D. R. Cox, S. Giesser, D. Hinkley, R. R. Hocking, and A. S. Young, and with a reply by the authors.
  • [22] van der Vaart, A. W. and Wellner, J. A. (1996)., Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics.
  • [23] Zheng, W. and van der Laan, M. J. (2011). Targeted maximum likelihood estimation of natural direct effect. Tech. Rep. 288, U.C. Berkeley Division of Biostatistics Working Paper, Series.