Electronic Journal of Statistics

Support vector regression for right censored data

Yair Goldberg and Michael R. Kosorok

Full-text: Open access

Abstract

We develop a unified approach for classification and regression support vector machines for when the responses are subject to right censoring. We provide finite sample bounds on the generalization error of the algorithm, prove risk consistency for a wide class of probability measures, and study the associated learning rates. We apply the general methodology to estimation of the (truncated) mean, median, quantiles, and for classification problems. We present a simulation study that demonstrates the performance of the proposed approach.

Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 532-569.

Dates
Received: February 2016
First available in Project Euclid: 2 March 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1488423807

Digital Object Identifier
doi:10.1214/17-EJS1231

Mathematical Reviews number (MathSciNet)
MR3619316

Zentralblatt MATH identifier
1390.62195

Keywords
Support vector regression right censored data generalization error universal consistency misspecification models

Rights
Creative Commons Attribution 4.0 International License.

Citation

Goldberg, Yair; Kosorok, Michael R. Support vector regression for right censored data. Electron. J. Statist. 11 (2017), no. 1, 532--569. doi:10.1214/17-EJS1231. https://projecteuclid.org/euclid.ejs/1488423807


Export citation

References

  • P. L. Bartlett. The sample complexity of pattern classification with neural networks., IEEE Transactions on Information Theory, 44(2):525–536, 1998.
  • E. Biganzoli, P. Boracchi, L. Mariani, and E. Marubini. Feed forward neural networks for the analysis of censored survival data: A partial logistic regression approach., Statist. Med., 17(10) :1169–1186, 1998.
  • D. Bitouzé, B. Laurent, and P. Massart. A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier estimator., Ann. Inst. H. Poincaré Probab. Statist., 35(6):735–763, 1999.
  • O. Bousquet and A. Elisseeff. Stability and generalization., Journal of Machine Learning Research, 2:499–526, 2002.
  • L. Breiman. Statistical modeling: The two cultures., Statistical Science, 16(3):199–231, 2001.
  • O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines., Machine Learning, 46(1–3):131–159, 2002.
  • P. Chen and A. A. Tsiatis. Causal inference on the difference of the restricted mean lifetime between two groups., Biometrics, 57(4) :1030–1038, 2001.
  • A. Eleuteri and A. F. G. Taktak. Support Vector Machines for Survival Regression. In E. Biganzoli, A. Vellido, F. Ambrogi, and R. Tagliaferri, editors, Computational Intelligence Methods for Bioinformatics and Biostatistics, number 7548, pages 176–189. Springer, 2011.
  • T. R. Fleming and D. P. Harrington., Counting Processes and Survival Analysis. Wiley, 1991.
  • Y. Goldberg and M. R. Kosorok. An exponential bound for Cox regression., Statistics & Probability Letters, 82(7) :1267–1272, 2012a.
  • Y. Goldberg and M. R. Kosorok. Q-learning with censored data., The Annals of Statistics, 40(1):529–560, 2012b.
  • Y. Goldberg and M. R. Kosorok. Supplement to “Support vector regression for right censored data”. 2017. DOI:, 10.1214/17-EJS1231SUPP.
  • I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines., Machine Learning, 46(1–3):389–422, 2002.
  • T. Hofmann, B. Schölkopf, and A. J. Smola. Kernel methods in machine learning., The Annals of Statistics, 36(3) :1171–1220, 2008.
  • T. Hothorn, B. Lausen, A. Benner, and M. Radespiel-Tröger. Bagging survival trees., Statistics in Medicine, 23(1):77–91, 2004.
  • H. Ishwaran and U. B. Kogalur. Consistency of random survival forests., Statistics & Probability Letters, 80(13–14) :1056–1064, 2010.
  • H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer. Random survival forests., The Annals of Applied Statistics, 2(3):841–860, 2008.
  • B. A. Johnson, D. Y. Lin, J. S. Marron, J. Ahn, J. Parker, and C. M. Perou. Threshhold analyses for inference in high dimension low sample size datasets with censored outcomes. Unpublished manuscript, 2004.
  • T. G. Karrison. Use of Irwin’s restricted mean as an index for comparing survival in different treatment groups–Interpretation and power considerations., Controlled Clinical Trials, 18(2):151–167, 1997.
  • M. R. Kosorok., Introduction to Empirical Processes and Semiparametric Inference. Springer, New York, 2008.
  • J. F. Lawless., Statistical Models and Methods for Lifetime Data. Wiley, 2003.
  • B. D. Ripley and R. M. Ripley. Neural networks as statistical methods in survival analysis. In Ri. Dybowski and V. Gant, editors, Clinical Applications of Artificial Neural Networks, pages 237–255. Cambridge University Press, 2001.
  • J. M. Robins, A. Rotnitzky, and L. P. Zhao. Estimation of regression coefficients when some regressors are not always observed., Journal of the American Statistical Association, 89(427):846–866, 1994.
  • M. R. Segal. Regression Trees for Censored Data., Biometrics, 44(1), 1988.
  • J. Shim and C. Hwang. Support vector censored quantile regression under random censoring., Computational Statistics & Data Analysis, 53(4):912–919, 2009.
  • P. K. Shivaswamy, W. Chu, and M. Jansche. A support vector approach to censored targets. In, Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), Omaha, Nebraska, USA, pages 655–660. IEEE Computer Society, 2007.
  • I. Steinwart and A. Chirstmann., Support Vector Machines. Springer, 2008.
  • I. Steinwart, D. Hush, and C. Scovel. An oracle inequality for clipped regularized risk minimizers. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1321–1328. MIT Press, Cambridge, MA, 2007.
  • R. Tibshirani. The lasso method for variable selection in the cox model., Statistics in Medicine, 16(4):385–395, 1997.
  • A. A. Tsiatis., Semiparametric Theory and Missing Data. Springer, 2006.
  • A. W. van der Vaart and J. A. Wellner., Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, 1996.
  • J. Wellner. On an exponential bound for the Kaplan-Meier estimator., Lifetime Data Analysis, 13(4):481–496, 2007.
  • Q. Wu, Y. Ying, and D. Zhou. Multi-kernel regularized classifiers., Journal of Complexity, 23(1):108–134, 2007.
  • Y. Zhao, D. Zeng, M. A. Socinski, and M. R. Kosorok. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer., Biometrics, 67(4) :1422–1433, 2011.
  • Y. Zhao, D. Zeng, E. B Laber, R. Song, M. Yuan, and M. Kosorok. Doubly robust learning for estimating individualized treatment with censored data., Biometrika, 102(1):151–168, 2015.
  • R. Zhu and M. R. Kosorok. Recursively Imputed Survival Trees., Journal of the American Statistical Association, 107(497):331–340, 2011.
  • D. M. Zucker. Restricted mean life with covariates: Modification and extension of a useful survival analysis method., Journal of the American Statistical Association, 93(442):702–709, 1998.

Supplemental materials