Open Access
February 2012 Q-learning with censored data
Yair Goldberg, Michael R. Kosorok
Ann. Statist. 40(1): 529-560 (February 2012). DOI: 10.1214/12-AOS968


We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.


Download Citation

Yair Goldberg. Michael R. Kosorok. "Q-learning with censored data." Ann. Statist. 40 (1) 529 - 560, February 2012.


Published: February 2012
First available in Project Euclid: 7 May 2012

zbMATH: 1246.62206
MathSciNet: MR3014316
Digital Object Identifier: 10.1214/12-AOS968

Primary: 62G05 , 62G20 , 62N02

Keywords: Generalization error , Q-learning , reinforcement learning , Survival analysis

Rights: Copyright © 2012 Institute of Mathematical Statistics

Vol.40 • No. 1 • February 2012
Back to Top