## The Annals of Applied Statistics

### Dynamic prediction of disease progression for leukemia patients by functional principal component analysis of longitudinal expression levels of an oncogene

#### Abstract

Patients’ biomarker data are repeatedly measured over time during their follow-up visits. Statistical models are needed to predict disease progression on the basis of these longitudinal biomarker data. Such predictions must be conducted on a real-time basis so that at any time a new biomarker measurement is obtained, the prediction can be updated immediately to reflect the patient’s latest prognosis and further treatment can be initiated as necessary. This is called dynamic prediction. The challenge is that longitudinal biomarker values fluctuate over time, and their changing patterns vary greatly across patients. In this article, we apply functional principal components analysis (FPCA) to longitudinal biomarker data to extract their features, and use these features as covariates in a Cox proportional hazards model to conduct dynamic predictions. Our flexible approach comprehensively characterizes the trajectory patterns of the longitudinal biomarker data. Simulation studies demonstrate its robust performance for dynamic prediction under various scenarios. The proposed method is applied to dynamically predict the risk of disease progression for patients with chronic myeloid leukemia following their treatments with tyrosine kinase inhibitors. The FPCA method is applied to their longitudinal measurements of BCR-ABL gene expression levels during follow-up visits to obtain the changing patterns over time as predictors.

#### Article information

Source
Ann. Appl. Stat., Volume 11, Number 3 (2017), 1649-1670.

Dates
Revised: April 2017
First available in Project Euclid: 5 October 2017

https://projecteuclid.org/euclid.aoas/1507168843

Digital Object Identifier
doi:10.1214/17-AOAS1050

Mathematical Reviews number (MathSciNet)
MR3709573

Zentralblatt MATH identifier
1380.62261

#### Citation

Yan, Fangrong; Lin, Xiao; Huang, Xuelin. Dynamic prediction of disease progression for leukemia patients by functional principal component analysis of longitudinal expression levels of an oncogene. Ann. Appl. Stat. 11 (2017), no. 3, 1649--1670. doi:10.1214/17-AOAS1050. https://projecteuclid.org/euclid.aoas/1507168843

#### References

• Antolini, L., Boracchi, P. and Biganzoli, E. (2005). A time-dependent discrimination index for survival data. Stat. Med. 24 3927–3944.
• Berkey, C. and Kent, R. J. (2009). Longitudinal principal components and non-linear regression models of early childhood growth. Ann. Hum. Biol. 10 523–536.
• Besse, P. and Ramsay, J. O. (1986). Principal components analysis of sampled functions. Psychometrika 51 285–311.
• Breslow, N. E. (1972). Discussion of “Regression models and life-tables” by D. R. Cox. J. Roy. Statist. Soc. Ser. B 34 187–220.
• Brown, E. R., Ibrahim, J. G. and DeGruttola, V. (2005). A flexible B-spline model for multiple longitudinal biomarkers and survival. Biometrics 61 64–73.
• Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc. Ser. B 34 187–220.
• Dai, X., Hadjipantelis, P. Z., Ji, H., Mueller, H. G. and Wang, J. L. (2016). Functional data analysis and empirical dynamics. Available at https://cran.r-project.org/web/packages/fdapace/fdapace.pdf.
• Grambsch, P. M. and Therneau, T. M. (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81 515–526.
• Grant, S., Chen, Y. Q. and May, S. (2014). Performance of goodness-of-fit tests for the Cox proportional hazards model with time-varying covariates. Lifetime Data Anal. 20 355–368.
• Hall, P., Müller, H.-G. and Wang, J.-L. (2006). Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 34 1493–1517.
• Harrell, F. E., Lee, K. L. and Mark, D. B. (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15 361–387.
• Heagerty, P. J. and Zheng, Y. (2005). Survival model predictive accuracy and ROC curves. Biometrics 61 92–105.
• Huang, X. and Liu, L. (2007). A joint frailty model for survival and gap times between recurrent events. Biometrics 63 389–397.
• Huang, X., Yan, F., Ning, J., Feng, Z., Choi, S. and Cortes, J. (2016). A two-stage approach for dynamic prediction of time-to-event distributions. Stat. Med. 35 2167–2182.
• Ibrahim, J. G., Chen, M.-H. and Sinha, D. (2004). Bayesian methods for joint modeling of longitudinal and survival data with applications to cancer vaccine trials. Statist. Sinica 14 863–883.
• James, G. M., Hastie, T. J. and Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika 87 587–602.
• Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53 457–481.
• Leng, X. and Müller, H.-G. (2006). Classification using functional data analysis for temporal gene expression data. Bioinformatics 22 68–76.
• Lin, D. Y., Wei, L. J. and Ying, Z. (1993). Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80 557–572.
• Lin, J., Zhang, D. and Davidian, M. (2006). Smoothing spline-based score tests for proportional hazards models. Biometrics 62 803–812.
• Liu, L. and Huang, X. (2009). Joint analysis of correlated repeated measures and recurrent events processes in the presence of death, with application to a study on acquired immune deficiency syndrome. J. R. Stat. Soc. Ser. C. Appl. Stat. 58 65–81.
• Liu, X. and Yang, M. C. K. (2009). Identifying temporally differentially expressed genes through functional principal components analysis. Biostatistics 10 667–679.
• Pauler, D. and Finkelstein, D. (2002). Predicting time to prostate cancer recurrence based on joint models for non-linear longitudinal biomarkers and event time outcomes. Stat. Med. 21(24) 3897–3911.
• Quintas-Cardama, A., Choi, S., Kantarjian, H., Jabbour, E., Huang, X. and Cortes, J. (2014). Predicting outcomes in patients with chronic myeloid leukemia at any time during tyrosine kinase inhibitor therapy. Clin. Lymphoma Myeloma Leuk. 14 327–334.
• Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233–243.
• Rizopoulos, D. (2010). JM: An R package for the joint modelling of longitudinal and time-to-event data. J. Stat. Softw. 35 (9) 1–33.
• Rizopoulos, D. (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 67 819–829.
• Rizopoulos, D. and Ghosh, P. (2011). A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat. Med. 30 1366–1380.
• Rizopoulos, D., Hatfield, L. A., Carlin, B. P. and Takkenberg, J. J. M. (2014). Combining dynamic predictions from joint models for longitudinal and time-to-event data using Bayesian model averaging. J. Amer. Statist. Assoc. 109 1385–1397.
• Silverman, B. W. (1996). Smoothed functional principal components analysis by choice of norm. Ann. Statist. 24 1–24.
• Slate, E. and Turnbull, B. (2000). Statistical models for longitudinal biomarkers of disease onset. Stat. Med. 19(4) 617–637.
• Song, X., Davidian, M. and Tsiatis, A. A. (2002). A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics 58 742–753.
• Staniswalis, J. G. and Lee, J. J. (1998). Nonparametric regression analysis of longitudinal data. J. Amer. Statist. Assoc. 93 1403–1418.
• Tsiatis, A. A. and Davidian, M. (2001). A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika 88 447–458.
• Uno, H., Cai, T., Tian, L. and Wei, L. J. (2007). Evaluating prediction rules for $t$-year survivors with censored regression models. J. Amer. Statist. Assoc. 102 527–537.
• Wulfsohn, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics 53 330–339.
• Xu, J. and Zeger, S. L. (2001). Joint analysis of longitudinal data comprising repeated measures and times to events. J. Roy. Statist. Soc. Ser. C 50 375–387.
• Yao, F. and Lee, T. C. M. (2006). Penalized spline models for functional principal component analysis. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 3–25.
• Yao, F., Müller, H.-G. and Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 100 577–590.
• Yao, F., Müller, H.-G., Clifford, A. J., Dueker, S. R., Follett, J., Lin, Y., Buchholz, B. A. and Vogel, J. S. (2003). Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics 59 676–685.
• Zheng, Y., Cai, T. and Feng, Z. (2006). Application of the time-dependent ROC curves for prognostic accuracy with multiple biomarkers. Biometrics 62 279–287, 321.