Bernoulli

  • Bernoulli
  • Volume 10, Number 6 (2004), 1011-1037.

Asymptotically optimal model selection method with right censored outcomes

Sündüz Keles, Mark Van Der Laan, and Sandrine Dudoit

Full-text: Open access

Abstract

Over the last two decades, nonparametric and semi-parametric approaches that adapt well-known techniques such as regression methods to the analysis of right censored data, e.g. right censored survival data, have become popular in the statistics literature. However, the problem of choosing the best model (predictor) among a set of proposed models in the right censored data setting has received little attention. We develop a new cross-validation-based model selection method to select among predictors of right censored outcomes such as survival times. The proposed method considers the risk of a given predictor based on the training sample as a parameter of the full data distribution in a right censored data model. Then, the doubly robust locally efficient estimation method or an ad hoc inverse probability of censoring weighting method, as presented by Robins and Rotnitzky and later by van der Laan and Robins, is used to estimate this conditional risk parameter based on the validation sample. We prove that, under general conditions, the proposed cross-validated selector is asymptotically equivalent to an oracle benchmark selector based on the true data generating distribution. The method presented covers model selection with right censored data in prediction (univariate and multivariate) and density/hazard estimation problems.

Article information

Source
Bernoulli, Volume 10, Number 6 (2004), 1011-1037.

Dates
First available in Project Euclid: 21 January 2005

Permanent link to this document
https://projecteuclid.org/euclid.bj/1106314848

Digital Object Identifier
doi:10.3150/bj/1106314848

Mathematical Reviews number (MathSciNet)
MR2108041

Zentralblatt MATH identifier
1064.62047

Keywords
cross-validation density/hazard estimator selection model selection multivariate prediction nonparametric/semi-parametric regression prediction of survival right censored data univariate prediction

Citation

Keles, Sündüz; Van Der Laan, Mark; Dudoit, Sandrine. Asymptotically optimal model selection method with right censored outcomes. Bernoulli 10 (2004), no. 6, 1011--1037. doi:10.3150/bj/1106314848. https://projecteuclid.org/euclid.bj/1106314848


Export citation

References

  • [1] Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Csáki (eds), Second International Symposium on Information Theory, Budapest: Akadémiai Kiado´. pp. 267-281..
  • [2] Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., Powell, J., Yang, L., Marti, G., Moore, T., Hudson, J., Lu, L., Lewis, D., Tibshirani, R., Sherlock, G., Chan, W., Greiner, T., Weisenburger, D., Armitage, J., Warnke, R., Levy, R., Wilson, W., Grever, M., Byrd, J., Botstein, D., Brown, P. and Staudt, L., (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403-6769, 503-511.
  • [3] Andersen, P., Borgan, fl., Gill, R. and Keiding, N. (1993) Statistical Models Based on Counting Processes. New York: Springer-Verlag.
  • [4] Beer, D., Kardia, S., Huang, C.-C., Giordano, T., Levin, A., Misek, D., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J., Iannettoni, M., Obringer, M. and Hanash, S. (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Med., 8, 816-824.
  • [5] Binder, D.A. (1992) Fitting Cox proportional hazards models from survey data. Biometricka, 79, 139-147. Abstract can also be found in the ISI/STMA publication
  • [6] Bozdogan, H. (2000) Akaikés information criterion and recent developments in information complexity. J. Math. Psych., 44, 62-91.
  • [7] Breiman, L., Friedman, J., Olshan, R. and Stone, C. (1984) Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks/Cole.
  • [8] Burman, P. (1989) A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning testing methods. Biometrika, 76, 503-514. Abstract can also be found in the ISI/STMA publication
  • [9] Davis, R. and Anderson, J. (1989) Exponential survival trees. Statist. Med., 8, 947-961. Abstract can also be found in the ISI/STMA publication
  • [10] Garber, M.E., Troyanskaya, O.G., Schluens, K., Petersen, S., Thaesler, Z., Pacyna-Gengelbach, M., van de Rijn, M., Rosen, G.D., Perou, C.M., Whyte, R.I., Altman, R.B., Brown, P.O., Botstein, D. and Petersen, I. (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc. Nat. Acad. Sci. USA, 98, 13 784-13 789.
  • [11] Gill, R., van der Laan, M. and Robins, J. (1997) Coarsening at random: characterizations, conjectures and counter-examples. In D. Lin and T. Fleming (eds), Proceedings of the First Seattle Symposium in Biostatistics, pp. 255-294. New York: Springer-Verlag.
  • [12] Gordon, L. and Olshen, R. (1985) Tree-structured survival analysis. Cancer Treatment Rep., 69, 1062- 1069.
  • [13] Graft, E., Schmoor, C., Sauerbrei, W. and Schumacher, M. (1999) Assessment and comparison of prognostic classification schemes for survival data. Statist. Med., 18, 2529-2545.
  • [14] Hastie, T. and Tibshirani, R. (1990a) Exploring the nature of covariate effects in the proportional hazards model. Biometrics, 46, 1005-1016. Abstract can also be found in the ISI/STMA publication
  • [15] Hastie, T. and Tibshirani, R. (1990b) Generalized Additive Models. London: Chapman and Hall. Abstract can also be found in the ISI/STMA publication
  • [16] Heitjan, D. and Rubin, D. (1991) Ignorability and coarse data. Ann. Statist., 19, 2244-2253. Abstract can also be found in the ISI/STMA publication
  • [17] Jacobsen, M. and Keiding, N. (1995) Coarsening at random in general sample spaces and random censoring in continuous time. Ann. Statist., 23, 774-786. Abstract can also be found in the ISI/STMA publication
  • [18] Keles S. (2003) Statistical methods for cis-regulatory motif detection in DNA sequences and two censored data problems. PhD thesis, University of California, Berkeley.
  • [19] Keles S., van der Laan, M. and Dudoit, S. (2003) Asymptotically optimal model selection method with right censored outcomes. Techinical Report 124, Division of Biostatistics, University of California: Berkely. http://www.bepress.com/ucbbiostat/paper124/
  • [20] Kooperberg, C., Stone, C. and Truong, Y. (1995) Hazard regression. J. Amer. Statist. Assoc., 90, 78-94. Abstract can also be found in the ISI/STMA publication
  • [21] Korn, E. and Simon, R. (1990) Measures of explained variation for survival data. Statist. Med., 9, 487-503. Abstract can also be found in the ISI/STMA publication
  • [22] Leblanc, M. and Crowley, J. (1992) Relative risk trees for censored data. Biometrics, 48, 411-425. Abstract can also be found in the ISI/STMA publication
  • [23] O´Quigley, J. and Xu, R. (2001) Explained variation in proportional hazards regression. In J. Crowely (ed.), Handbook of Statistics in Clinical Oncology, pp. 397-409. New York: Marcel Dekker.
  • [24] Pugh, M., Robins, J., Lipsitz, S. and Harrington, D. (1993) Inference in the Cox proportional hazards model with missing covariate. Technical report, Department of Biostatistics, Harvard University.
  • [25] Robins, J. (1993) Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. Proc. Biopharm. Sect. Amer. Statist. Assoc., 24-33.
  • [26] Robins, J. and Rotnitzky, A. (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In N.P. Jewell, K. Dietz and V.T. Farewell (eds), AIDS Epidemiology: Methodoligical Issues. Boston: Birkhäuser.
  • [27] Robins, J. and Rotnitzky, A. (2001) Comment on the Bickel and Kwon article, 'Inference for semiparametric models: Some questions and an answer´. Statist. Sinica, 11, 920-936.
  • [28] Robins, J., Rotnitzky, A. and van der Laan, M. (2000) Comment on 'On Profile Likelihood´ by S.A. Murphy and A.W. van der Vaart. J. Amer. Statist. Assoc., 95, 477-482.
  • [29] Ruczinski, I., Kooperberg, C. and Leblanc, M.L. (2001) Logic regression. Manuscript.
  • [30] Schemper, M. and Henderson, R. (2000) Predictive accuracy and explained variation in Cox regression. Biometrics, 56, 249-255. Abstract can also be found in the ISI/STMA publication
  • [31] Schemper, M. and Stare, J. (1996) Explained variation in survival analysis. Statist. medicine, 15, 1999-2012. Abstract can also be found in the ISI/STMA publication
  • [32] Schwartz, G. (1978) Estimating the dimension of a model. Ann. Statist., 6, 461-464.
  • [33] Segal, M. (1988) Regression trees for censored data. Biometrics, 44, 35-47. Abstract can also be found in the ISI/STMA publication
  • [34] Shao, J. (1993) Linear model selection by cross-validation. J. Amer. Statist. Assoc., 88, 486-494. Sorliea, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Thorsen, T., Quist, H., Matese, J.C., Brown, P.O., Botstein, D., Lonning, Abstract can also be found in the ISI/STMA publication
  • [35] P.E. and Borresen-Dale, A.-L. (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Nat. Acad. Sci. USA, 98, 10 869-10 874.
  • [36] van der Laan, M.J. and Dudoit, S. (2003) Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples. Technical Report 130, Division of Biostatistics, University of California, Berkeley. http://www.bepress.com/ucbbiostat/paper130/
  • [37] van der Laan, M. and Robins, J. (2003) Unified Methods for Censored Longitudinal Data and Causality. New York: Springer-Verlag.
  • [38] van der Vaart, A. and Wellner, J. (1996) Weak Convergence and Empirical Processes. New York: Springer-Verlag. Wigle, D.A., Jurisica, I., Radulovich, N., Pintilie, M., Rossant, J., Ni Liu, C.L., Woodgett, J., Seiden, I., Johnston, M., Shaf Keshavjee, G.D., Winton, T., Breitkreutz, B.J., Jorgenson, P., Mike Tyers, Abstract can also be found in the ISI/STMA publication
  • [39] F.A.S. and Tsao, M.S. (2002) Molecular profiling of non-small cell lung cancer and correlation with disease-free survival, Cancer Res., 62, 3005-3008.