The Annals of Applied Statistics

A bias correction for the minimum error rate in cross-validation

Ryan J. Tibshirani and Robert Tibshirani

Source: Ann. Appl. Stat. Volume 3, Number 2 (2009), 822-829.

Abstract

Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter. We propose a simple method for the estimation of this bias that uses information from the cross-validation process. As a result, it requires essentially no additional computation. We apply our bias estimate to a number of popular classifiers in various settings, and examine its performance.

Keywords: Cross-validation; prediction error estimation; optimism estimation

Full-text: Access denied (no subscription detected)

In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1245676196
Digital Object Identifier: doi:10.1214/08-AOAS224
Zentralblatt MATH identifier: 1166.62311

References

Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984)., Classification and Regression Trees. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR726392
Zentralblatt MATH: 0541.62042
Efron, B. (1979). Bootstrap methods: Another look at the jackknife., Ann. Statist. 7 1–26.
Mathematical Reviews (MathSciNet): MR515681
Zentralblatt MATH: 0406.62024
Digital Object Identifier: doi:10.1214/aos/1176344552
Project Euclid: euclid.aos/1176344552
Efron, B. (2008). Empirical Bayes estimates for large-scale prediction problems. Available at, http://www-stat.stanford.edu/~ckirby/brad/papers/2008EBestimates.pdf.
Efron, B. and Tibshirani, R. (1993)., An Introduction to the Bootstrap. Chapman & Hall, London.
Mathematical Reviews (MathSciNet): MR1270903
Zentralblatt MATH: 0835.62038
Stone, M. (1977). Asymptotics for and against cross-validation., Biometrika 64 29–35.
Mathematical Reviews (MathSciNet): MR474601
Zentralblatt MATH: 0368.62046
Digital Object Identifier: doi:10.1093/biomet/64.1.29
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2001). Diagnosis of multiple cancer types by shrunken centroids of gene expression., Proc. Natl. Acad. Sci. 99 6567–6572.
Varma, S. and Simon, R. (2006). Bias in error estimation when using cross-validation for model selection., BMC Bioinformatics 91.

2009 © Institute of Mathematical Statistics