Electronic Journal of Statistics

D-learning to estimate optimal individual treatment rules

Zhengling Qi and Yufeng Liu

Full-text: Open access


Recent exploration of the optimal individual treatment rule (ITR) for patients has attracted a lot of attentions due to the potential heterogeneous response of patients to different treatments. An optimal ITR is a decision function based on patients’ characteristics for the treatment that maximizes the expected clinical outcome. Current literature mainly focuses on two types of methods, model-based and classification-based methods. Model-based methods rely on the estimation of conditional mean of outcome instead of directly targeting decision boundaries for the optimal ITR. As a result, they may yield suboptimal decisions. In contrast, although classification based methods directly target the optimal ITR by converting the problem into weighted classification, these methods rely on using correct weights for all subjects, which may cause model misspecification. To overcome the potential drawbacks of these methods, we propose a simple and flexible one-step method to directly learn (D-learning) the optimal ITR without model and weight specifications. Multi-category D-learning is also proposed for the case with multiple treatments. A new effect measure is proposed to quantify the relative strength of an treatment for a patient. We show estimation consistency and establish tight finite sample error bounds for the proposed D-learning. Numerical studies including simulated and real data examples are used to demonstrate the competitive performance of D-learning.

Article information

Electron. J. Statist., Volume 12, Number 2 (2018), 3601-3638.

Received: September 2017
First available in Project Euclid: 31 October 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Precision medicine multiple treatments kernel learning prescriptive variable selection

Creative Commons Attribution 4.0 International License.


Qi, Zhengling; Liu, Yufeng. D-learning to estimate optimal individual treatment rules. Electron. J. Statist. 12 (2018), no. 2, 3601--3638. doi:10.1214/18-EJS1480. https://projecteuclid.org/euclid.ejs/1540951343

Export citation


  • [1] G. Baron, E. Perrodeau, I. Boutron, and P. Ravaud. Reporting of analyses from randomized controlled trials with multiple arms: a systematic review., BMC medicine, 11(1):84, 2013.
  • [2] P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results., Journal of Machine Learning Research, 3(Nov):463–482, 2002.
  • [3] P. Bühlmann and S. Van De Geer., Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
  • [4] J. Cohen. Statistical power analysis for the behavior science., Lawrance Eribaum Association, 1988.
  • [5] C. Cortes and V. Vapnik. Support-vector networks., Machine learning, 20(3):273–297, 1995.
  • [6] N. Cristianini and J. Shawe-Taylor., An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, 2000.
  • [7] Y. Cui, R. Zhu, and M. Kosorok. Tree based weighted learning for estimating individualized treatment rules with censored data., Electronic journal of statistics, 11(2) :3927–3953, 2017.
  • [8] A. Fan, W. Lu, and R. Song. Sequential advantage selection for optimal treatment regime., The annals of applied statistics, 10(1):32, 2016.
  • [9] C. Fan, W. Lu, R. Song, and Y. Zhou. Concordance-assisted learning for estimating optimal individualized treatment regimes., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5) :1565–1582, 2017.
  • [10] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American statistical Association, 96(456) :1348–1360, 2001.
  • [11] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent., Journal of Statistical Software, 33(1):1–22, 2010. URL http://www.jstatsoft.org/v33/i01/.
  • [12] L. Gunter, J. Zhu, and S. Murphy. Variable selection for qualitative interactions., Statistical methodology, 8(1):42–55, 2011.
  • [13] S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich, W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter., New England Journal of Medicine, 335(15) :1081–1090, 1996.
  • [14] T. N. Kakuda. Pharmacology of nucleoside and nucleotide reverse transcriptase inhibitor-induced mitochondrial toxicity., Clinical therapeutics, 22(6):685–708, 2000.
  • [15] G. S. Kimeldorf and G. Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines., The Annals of Mathematical Statistics, 41(2):495–502, 1970.
  • [16] E. Laber and Y. Zhao. Tree-based methods for individualized treatment regimes., Biometrika, 102(3):501–514, 2015.
  • [17] E. B. Laber, D. J. Lizotte, M. Qian, W. E. Pelham, and S. A. Murphy. Dynamic treatment regimes: Technical challenges and applications., Electronic journal of statistics, 8(1) :1225, 2014.
  • [18] M. Ledoux and M. Talagrand., Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, 2013.
  • [19] Y. Lin and H. H. Zhang. Component selection and smoothing in multivariate nonparametric regression., The Annals of Statistics, 34(5) :2272–2297, 2006.
  • [20] Y. Liu, Y. Wang, M. R. Kosorok, Y. Zhao, and D. Zeng. Augmented outcome-weighted learning for estimating optimal dynamic treatment regimens., Statistics in medicine, 2018.
  • [21] W. Lu, H. H. Zhang, and D. Zeng. Variable selection for optimal treatment decision., Statistical methods in medical research, page 0962280211428383, 2011.
  • [22] S. A. Murphy. Optimal dynamic treatment regimes., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):331–355, 2003.
  • [23] S. A. Murphy. A generalization error for q-learning., Journal of Machine Learning Research, 6(Jul) :1073–1097, 2005.
  • [24] M. Qian and S. A. Murphy. Performance guarantees for individualized treatment rules., Annals of statistics, 39(2) :1180, 2011.
  • [25] J. M. Robins. Optimal structural nested models for optimal sequential decisions. In, Proceedings of the second seattle Symposium in Biostatistics, pages 189–326. Springer, 2004.
  • [26] P. J. Schulte, A. A. Tsiatis, E. B. Laber, and M. Davidian. Q-and A-learning methods for estimating optimal dynamic treatment regimes., Statistical science: a review journal of the Institute of Mathematical Statistics, 29(4):640, 2014.
  • [27] R. Song, M. Kosorok, D. Zeng, Y. Zhao, E. Laber, and M. Yuan. On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning., Stat, 4(1):59–68, 2015.
  • [28] I. Steinwart and A. Christmann., Support vector machines. Springer Science & Business Media, 2008.
  • [29] I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels., The Annals of Statistics, 35(2):575–607, 2007.
  • [30] L. Tian, A. A. Alizadeh, A. J. Gentles, and R. Tibshirani. A simple method for estimating interactions between a treatment and a large number of covariates., Journal of the American Statistical Association, 109(508) :1517–1532, 2014.
  • [31] S. A. van de Geer, M. C. Veraar, J. A. Wellner, et al. Nemirovski’s inequalities revisited., American Mathematical Monthly, 117(2):138–160, 2010.
  • [32] G. Wahba. An introduction to smoothing spline anova models in rkhs, with examples in geographical data, medicine, atmospheric sciences and machine learning., IFAC Proceedings Volumes, 36(16):531–536, 2003.
  • [33] C. J. Watkins and P. Dayan. Q-learning., Machine learning, 8(3–4):279–292, 1992.
  • [34] Y. Wu and Y. Liu. Robust truncated hinge loss support vector machines., Journal of the American Statistical Association, 102(479):974–983, 2007.
  • [35] B. Zhang, A. A. Tsiatis, E. B. Laber, and M. Davidian. A robust method for estimating optimal treatment regimes., Biometrics, 68(4) :1010–1018, 2012.
  • [36] C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty., The Annals of statistics, pages 894–942, 2010.
  • [37] Y. Zhao, D. Zeng, A. J. Rush, and M. R. Kosorok. Estimating individualized treatment rules using outcome weighted learning., Journal of the American Statistical Association, 107(499) :1106–1118, 2012.
  • [38] Y.-Q. Zhao, D. Zeng, E. B. Laber, and M. R. Kosorok. New statistical learning methods for estimating optimal dynamic treatment regimes., Journal of the American Statistical Association, 110(510):583–598, 2015.
  • [39] X. Zhou, N. Mayer-Hamblett, U. Khan, and M. R. Kosorok. Residual weighted learning for estimating individualized treatment rules., Journal of the American Statistical Association, 112(517):169–187, 2017.
  • [40] H. Zou and T. Hastie. Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.