Developing an effective multi-stage treatment strategy over time is one of the essential goals of modern medical research. Developing statistical inference, including constructing confidence intervals for parameters, is of key interest in studies applying dynamic treatment regimens. Estimation and inference in this context are especially challenging due to non-regularity caused by the non-smoothness of the problem in the parameters. While various bootstrap methods have been proposed, there is a lack of theoretical validation for most bootstrap inference methods. Recently, Song et al. [Penalized Q-learning for dynamic treatment regimes (2011) Submitted] proposed the penalized Q-learning procedure, that enables valid inference without the need of bootstrapping. As a major drawback, penalized Q-learning can only handle discrete covariates. To overcome this issue, we propose an adaptive Q-learning procedure which is an adaptive version of penalized Q-learning. We show that the proposed method can not only handle continuous covariates, but it can also be more efficient than penalized Q-learning.
Digital Object Identifier: 10.1214/12-IMSCOLL911