Electronic Journal of Statistics

Tree-based censored regression with applications in insurance

Olivier Lopez, Xavier Milhaud, and Pierre-E. Thérond

Full-text: Open access


We propose a regression tree procedure to estimate the conditional distribution of a variable which is not directly observed due to censoring. The model that we consider is motivated by applications in insurance, including the analysis of guarantees that involve durations, and claim reserving. We derive consistency results for our procedure, and for the selection of an optimal subtree using a pruning strategy. These theoretical results are supported by a simulation study, and two applications involving insurance datasets. The first concerns income protection insurance, while the second deals with reserving in third-party liability insurance.

Article information

Electron. J. Statist., Volume 10, Number 2 (2016), 2685-2716.

Received: October 2015
First available in Project Euclid: 12 September 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62N01: Censored data models 62N02: Estimation 62G08: Nonparametric regression
Secondary: 91B30: Risk theory, insurance 97M30: Financial and insurance mathematics

Survival analysis censoring regression tree model selection insurance


Lopez, Olivier; Milhaud, Xavier; Thérond, Pierre-E. Tree-based censored regression with applications in insurance. Electron. J. Statist. 10 (2016), no. 2, 2685--2716. doi:10.1214/16-EJS1189. https://projecteuclid.org/euclid.ejs/1473685451

Export citation


  • Bacchetti, P. and Segal, M. R. (1995). Survival trees with time-dependent covariates: application to estimating changes in the incubation period of AIDS., Lifetime Data Analysis 1 35–47.
  • Beran, R. (1981). Nonparametric regression with randomly censored survival data, Technical Report, University of California, Berkeley.
  • Bitouzé, D., Laurent, B. and Massart, P. (1999). A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier estimator., Ann. Inst. H. Poincaré Probab. Statist. 35 735–763.
  • Bou-Hamad, I., Larocque, D. and Ben-Ameur, H. (2011). A review of survival trees., Statistics Surveys 5 44–71.
  • Breiman, L., Friedman, J., Olshen, R. A. and Stone, C. J. (1984)., Classification and Regression Trees. Chapman and Hall.
  • Chaudhuri, P. (2000). Asymptotic consistency of median regression trees., JSPI 91 229–238.
  • Chaudhuri, P. and Loh, W.-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees., Bernoulli 8 561–576.
  • Ciampi, A., Negassa, A. and Lou, Z. (1995). Tree-structured prediction for censored survival data and the Cox model., Journal of Clinical Epidemiology 48 675–689.
  • Dabrowska, D. M. (1989). Uniform consistency of the kernel conditional Kaplan-Meier estimate., Ann. Statist. 17 1157–1167.
  • Dudley, R. M. (1999)., Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics.
  • Dudoit, S., van Der Laan, M. J., Keles, S., Molinaro, A., Sinisi, S. E. and Teng, S. L. (2003). Loss-based estimation with cross-validation: Applications to microarray data analysis and motif, finding.
  • Einmahl, U. and Mason, D. M. (2000). An empirical process approach to the uniform consistency of kernel-type function estimators., J. Theoret. Probab. 13 1–37.
  • Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators., Ann. Statist. 33 1380–1403.
  • Fan, J., Nunn, M. E. and Su, X. (2009). Multivariate exponential survival trees and their application to tooth prognosis., CSDA 53 1110–1121.
  • Gannoun, A., Saracco, J., Yuan, A. and Bonney, G. E. (2005). Non-parametric quantile regression with censored data., Scand. J. Statist. 32 527–550.
  • Gao, F., Manatunga, A. K. and Chen, S. (2004). Identification of prognostic factors with multivariate survival data., CSDA 45 813–824.
  • Gey, S. and Nedelec, E. (2005). Model selection for CART regression trees., IEEE Transactions on Information Theory 51 658–670.
  • Heagerty, P. J., Lumley, T. and Pepe, M. S. (2000). Time-Dependent ROC Curves for Censored SurvivalData and a Diagnostic Marker., Biometrics 56 337-344.
  • Heagerty, P. J. and Zheng, Y. (2005). Survival Model Predictive Accuracy and ROC Curves., Biometrics 61 92-105.
  • Heuchenne, C. and Van Keilegom, I. (2010a). Estimation in nonparametric location-scale regression models with censored data., Ann. Inst. Statist. Math. 62 439–463.
  • Heuchenne, C. and Van Keilegom, I. (2010b). Goodness-of-fit tests for the error distribution in nonparametric regression., Comput. Statist. Data Anal. 54 1942–1951.
  • Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A. andVan Der Laan, M. J. (2006). Survival ensembles., Biostatistics 7 355-373.
  • Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations., J. Amer. Statist. Assoc. 53 457–481.
  • Lopez, O. (2011). Nonparametric estimation of the multivariate distribution function in a censored regression model with applications., Communications in Statistics: Theory and Methods 40 2639–2660.
  • Lopez, O., Patilea, V. and Van Keilegom, I. (2013). Single index regression models in the presence of censoring depending on the covariates., Bernoulli 19 721–747.
  • Meinshausen, N. (2009). Forest garrote., Electronic Journal of Statistics 3 1288–1304.
  • Molinaro, A. M., Dudoit, S. and van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored data., JMVA 90 154–177.
  • Olbricht, W. (2012). Tree-based methods: a useful tool for life insurance., European Actuarial Journal 2 129–147.
  • Sánchez Sellero, C., González Manteiga, W. and Van Keilegom, I. (2005). Uniform representation of product-limit integrals with applications., Scand. J. Statist. 32 563–581.
  • Satten, G. A. and Datta, S. (2001). The Kaplan-Meier estimator as an inverse-probability-of-censoring weighted average., Amer. Statist. 55 207–210.
  • Stute, W. (1993). Consistent estimation under random censorship when covariables are present., J. Multivariate Anal. 45 89–103.
  • Stute, W. (1999). Nonlinear censored regression., Statist. Sinica 9 1089–1102.
  • Stute, W. and Wang, J. L. (1993). The strong law under random censorship., Ann. Statist. 21 1591–1607.
  • Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes., Ann. Probab. 22 28–76.
  • van Der Laan, M. J. and Dudoit, S. (2003). Unified Cross-Validation Methodology for Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and, Examples.
  • van Der Laan, M. J., Dudoit, S. and van der Vaart, A. W. (2006). The cross-validated adaptive epsilon-net estimator., Statistics and Decisions 24 373–395.
  • van der Laan, M. J. and Robins, J. M. (2003)., Unified Methods for Censored Longitudinal Data and Causality. Springer Series in Statistics. Springer-Verlag, New York.
  • van der Vaart, A. W. (1998)., Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge.
  • van der Vaart, A. W. and Wellner, J. A. (1996)., Weak Convergence and Empirical Processes with Applications to Statistics. Springer Series in Statistics. Springer-Verlag, New York.
  • Van Keilegom, I. and Akritas, M. G. (1999). Transfer of tail information in censored regression models., Ann. Statist. 27 1745–1784.
  • Wang, H. J. and Wang, L. (2009). Locally weighted censored quantile regression., JASA 104 1117–1128.
  • Wey, A., Wang, L. and Rudser, K. (2014). Censored quantile regression with recursive partitioning based weights., Biostatistics 15 170–181.