A bias correction for the minimum error rate in cross-validation
Ryan J. Tibshirani and Robert Tibshirani
Source: Ann. Appl. Stat.
Volume 3, Number 2
(2009), 822-829.
Abstract
Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter. We propose a simple method for the estimation of this bias that uses information from the cross-validation process. As a result, it requires essentially no additional computation. We apply our bias estimate to a number of popular classifiers in various settings, and examine its performance.
Keywords: Cross-validation; prediction error estimation; optimism estimation
Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit:
http://imstat.org/orders.
Links and Identifiers
Permanent link to this document: http://projecteuclid.org/euclid.aoas/1245676196
Digital Object Identifier: doi:10.1214/08-AOAS224
Zentralblatt MATH identifier:
1166.62311
References
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984)., Classification and Regression Trees. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet):
MR726392
Efron, B. (1979). Bootstrap methods: Another look at the jackknife., Ann. Statist. 7 1–26.
Mathematical Reviews (MathSciNet):
MR515681
Efron, B. (2008). Empirical Bayes estimates for large-scale prediction problems. Available at, http://www-stat.stanford.edu/~ckirby/brad/papers/2008EBestimates.pdf.
Efron, B. and Tibshirani, R. (1993)., An Introduction to the Bootstrap. Chapman & Hall, London.
Stone, M. (1977). Asymptotics for and against cross-validation., Biometrika 64 29–35.
Mathematical Reviews (MathSciNet):
MR474601
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2001). Diagnosis of multiple cancer types by shrunken centroids of gene expression., Proc. Natl. Acad. Sci. 99 6567–6572.
Varma, S. and Simon, R. (2006). Bias in error estimation when using cross-validation for model selection., BMC Bioinformatics 91.