Open Access
VOL. 9 | 2013 Adaptive Q-learning
Yair Goldberg, Rui Song, Michael R. Kosorok

Editor(s) M. Banerjee, F. Bunea, J. Huang, V. Koltchinskii, M. H. Maathuis

Inst. Math. Stat. (IMS) Collect., 2013: 150-162 (2013) DOI: 10.1214/12-IMSCOLL911


Developing an effective multi-stage treatment strategy over time is one of the essential goals of modern medical research. Developing statistical inference, including constructing confidence intervals for parameters, is of key interest in studies applying dynamic treatment regimens. Estimation and inference in this context are especially challenging due to non-regularity caused by the non-smoothness of the problem in the parameters. While various bootstrap methods have been proposed, there is a lack of theoretical validation for most bootstrap inference methods. Recently, Song et al. [Penalized Q-learning for dynamic treatment regimes (2011) Submitted] proposed the penalized Q-learning procedure, that enables valid inference without the need of bootstrapping. As a major drawback, penalized Q-learning can only handle discrete covariates. To overcome this issue, we propose an adaptive Q-learning procedure which is an adaptive version of penalized Q-learning. We show that the proposed method can not only handle continuous covariates, but it can also be more efficient than penalized Q-learning.


Published: 1 January 2013
First available in Project Euclid: 8 March 2013

zbMATH: 1325.62073
MathSciNet: MR3186754

Digital Object Identifier: 10.1214/12-IMSCOLL911

Primary: 62G05 , 62G20
Secondary: 62F12

Keywords: adaptive estimation , dynamic treatment regimes , penalized estimation , Q-learning

Rights: Copyright © 2010, Institute of Mathematical Statistics

Back to Top