The Annals of Statistics
- Ann. Statist.
- Volume 40, Number 1 (2012), 529-560.
Q-learning with censored data
We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.
Ann. Statist., Volume 40, Number 1 (2012), 529-560.
First available in Project Euclid: 7 May 2012
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Goldberg, Yair; Kosorok, Michael R. Q-learning with censored data. Ann. Statist. 40 (2012), no. 1, 529--560. doi:10.1214/12-AOS968. https://projecteuclid.org/euclid.aos/1336396182
- Supplementary material: Code and data sets. Please read the file README.pdf for details on the files in this folder.