The Annals of Statistics

High-dimensional $A$-learning for optimal dynamic treatment regimes

Chengchun Shi, Ailin Fan, Rui Song, and Wenbin Lu

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients’ responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. A major challenge in deriving an optimal dynamic treatment regime arises when an extraordinary large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history and clinical measurements over time are available, but not all of them are necessary for making treatment decision. This makes variable selection an emerging need in precision medicine.

In this paper, we propose a penalized multi-stage $A$-learning for deriving the optimal dynamic treatment regime when the number of covariates is of the nonpolynomial (NP) order of the sample size. To preserve the double robustness property of the $A$-learning method, we adopt the Dantzig selector, which directly penalizes the A-leaning estimating equations. Oracle inequalities of the proposed estimators for the parameters in the optimal dynamic treatment regime and error bounds on the difference between the value functions of the estimated optimal dynamic treatment regime and the true optimal dynamic treatment regime are established. Empirical performance of the proposed approach is evaluated by simulations and illustrated with an application to data from the STAR∗D study.

Article information

Ann. Statist., Volume 46, Number 3 (2018), 925-957.

Received: January 2016
Revised: January 2017
First available in Project Euclid: 3 May 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C99: None of the above, but in this section
Secondary: 62J07: Ridge regression; shrinkage estimators

$A$-learning Dantzig selector NP-dimensionality model misspecification optimal dynamic treatment regime oracle inequality


Shi, Chengchun; Fan, Ailin; Song, Rui; Lu, Wenbin. High-dimensional $A$-learning for optimal dynamic treatment regimes. Ann. Statist. 46 (2018), no. 3, 925--957. doi:10.1214/17-AOS1570.

Export citation


  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Candès, E. and Tao, T. (2007). Rejoinder: “The Dantzig selector: Statistical estimation when $p$ is much larger than $n$” [Ann. Statist. 35 (2007), 2313–2351; MR2382644]. Ann. Statist. 35 2392–2404.
  • Chakraborty, B., Murphy, S. and Strecher, V. (2010). Inference for non-regular parameters in optimal dynamic treatment regimes. Stat. Methods Med. Res. 19 317–343.
  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
  • Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 531–552.
  • Fava, M., Rush, A. J., Trivedi, M. H., Nierenberg, A. A., Thase, M. E., Sackeim, H. A., Quitkin, F. M., Wisniewski, S., Lavori, P. W., Rosenbaum, J. F. et al. (2003). Background and rationale for the sequenced treatment alternatives to relieve depression (STAR∗D) study. Psychiatr. Clin. North Am. 26 457–494.
  • Lu, W., Zhang, H. H. and Zeng, D. (2013). Variable selection for optimal treatment decision. Stat. Methods Med. Res. 22 493–504.
  • Luedtke, A. R. and van der Laan, M. J. (2016). Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Statist. 44 713–742.
  • Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 3498–3528.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Monographs on Statistics and Applied Probability, 2nd ed. Chapman & Hall, London.
  • Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 1248–1282.
  • Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2008). Uniform uncertainty principle for Bernoulli and subgaussian ensembles. Constr. Approx. 28 277–289.
  • Milman, V. D. and Pajor, A. (1989). Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed $n$-dimensional space. In Geometric Aspects of Functional Analysis (19871988). Lecture Notes in Math. 1376 64–104. Springer, Berlin.
  • Milman, V. D. and Pajor, A. (2003). Regularization of star bodies by random hyperplane cut off. Studia Math. 159 247–261.
  • Murphy, S. A. (2003). Optimal dynamic treatment regimes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 331–366.
  • Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Ann. Statist. 39 1180–1210.
  • Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiol. 11 550–560.
  • Rush, A. J., Fava, M., Wisniewski, S. R., Lavori, P. W., Trivedi, M. H., Sackeim, H. A., Thase, M. E., Nierenberg, A. A., Quitkin, F. M., Kashner, T. M. et al. (2004). Sequenced treatment alternatives to relieve depression (STAR∗D): Rationale and design. Control. Clin. Trials 25 119–142.
  • Shi, C., Song, R. and Lu, W. (2016). Robust learning for optimal treatment decision with NP-dimensionality. Electron. J. Stat. 10 2894–2921.
  • Shi, C., Fan, A., Song, R. and Lu, W. (2018). Supplement to “High-dimensional $A$-learning for optimal dynamic treatment regimes.” DOI:10.1214/17-AOS1570SUPP.
  • Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 273–282.
  • Watkins, C. J. C. H. and Dayan, P. (1992). $Q$-Learning. Mach. Learn. 8 279–292.
  • Zhang, B., Tsiatis, A. A., Laber, E. B. and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics 68 1010–1018.
  • Zhang, B., Tsiatis, A. A., Laber, E. B. and Davidian, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100 681–694.
  • Zhao, Y., Zeng, D., Rush, A. J. and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107 1106–1118.
  • Zhao, Y.-Q., Zeng, D., Laber, E. B. and Kosorok, M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. J. Amer. Statist. Assoc. 110 583–598.
  • Zhou, S. (2009). Restricted eigenvalue conditions on subgaussian random matrices. Available at arxiv:0912.4045.
  • Zhou, X., Mayer-Hamblett, N., Khan, U. and Kosorok, M. R. (2017). Residual weighted learning for estimating individualized treatment rules. J. Amer. Statist. Assoc. 112 169–187.

Supplemental materials

  • Supplement to “High-dimensional $\boldsymbol{A}$-learning for optimal dynamic treatment regimes”. Supplementary material includes some proofs.