Electronic Journal of Statistics

On last observation carried forward and asynchronous longitudinal regression analysis

Hongyuan Cao, Jialiang Li, and Jason P. Fine

Full-text: Open access


In many longitudinal studies, the covariates and response are often intermittently observed at irregular, mismatched and subject-specific times. Last observation carried forward (LOCF) is one of the most commonly used methods to deal with such data when covariates and response are observed asynchronously. However, this can lead to considerable bias. In this paper, we propose a weighted LOCF estimation using asynchronous longitudinal data for the generalized linear model. We further generalize this approach to utilize previously observed covariates in addition to the most recent observation. In comparison to earlier methods, the current methods are valid under weaker assumptions on the covariate process and allow informative observation times which may depend on response even conditional on covariates. Extensive simulation studies provide numerical support for the theoretical findings. Data from an HIV study is used to illustrate our methodology.

Article information

Electron. J. Statist., Volume 10, Number 1 (2016), 1155-1180.

Received: November 2015
First available in Project Euclid: 3 May 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60G20: Generalized stochastic processes
Secondary: 60G05: Foundations of stochastic processes

Asynchronous longitudinal data Kernel weighted estimation last observation carried forward nonparametric regression


Cao, Hongyuan; Li, Jialiang; Fine, Jason P. On last observation carried forward and asynchronous longitudinal regression analysis. Electron. J. Statist. 10 (2016), no. 1, 1155--1180. doi:10.1214/16-EJS1141. https://projecteuclid.org/euclid.ejs/1462297865

Export citation


  • [1] Cao, H., Zeng, D. and Fine, J. P. (2015). Regression analysis of sparse asynchronous longitudinal data., J. R. Stat. Soc. B 77, 755–776.
  • [2] Cook, R. J., Zeng, L. and Yi, G. Y. (2004). Marginal analysis of incomplete longitudinal binary data: a cautionary note on LOCF imputation., Biometrics 60, 820–838.
  • [3] Diggle, P., Heagerty, P., Liang, K. Y. and Zeger, S. L. (2002)., Analysis of Longitudinal Data (2nd ed.), Clarendon, TX: Clarendon Press.
  • [4] Lavori, P. W. (1992). Clinical trials in psychiatry: should protocol deviation censor patient data?, Neuropsychopharmacology 6, 39–48.
  • [5] Liang, K.-Y. and Zeger, S. L. (1986) Longitudinal data analysis using generalized linear model., Biometrika 73, 13–22.
  • [6] Lin, H., Scharfstein, D. O. and Rosenheck, R. A. (2004). Analysis of longitudinal data with irregular, outcome-dependent follow-up., J. Roy. Statist. Soc. Ser. B 66, 791–813.
  • [7] Lin, D. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data., J. Amer. Statist. Assoc. 96, 103–113.
  • [8] Little, R. J. A. and Rubin, D. B. (2002)., Statistical Analysis with Missing Data (2nd ed.), New York: Wiley.
  • [9] Molenberghs, G., Thijs, H., Jansen, I., Beunckens, C., Kenward, M. G., Mallinckrodt, C. and Carroll, R. J. (2004). Analyzing incomplete longitudinal clinical trail data., Biostatistics 5, 445–464.
  • [10] Pepe, M. S. and Anderson, G. L. (1994) A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data., Communications in Statistics – Simulation and Computation 23, 939–951.
  • [11] Phillips, A. N. et al. (2001). HIV viral load response to antiretroviral therapy according to the baseline CD4 cell count and viral load., The Journal of American Medical Association. 286, 2560–2567.
  • [12] Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed., J. Amer. Statist. Assoc. 89, 846–866.
  • [13] Robins, J. M., Rotnitzky, A., Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data., J. Amer. Statist. Assoc. 90, 106–121..
  • [14] Rubin, D. (1996). Multiple imputation after $18+$ years., J. Amer. Statist. Assoc. 91, 473–489.
  • [15] Sentürk, D., Dalrymple, L. S., Mohammed, S. M., Kaysen, G. A. and Nguyen, D. V. (2012). Modeling time-varying effects with generalized and unsynchronized longitudinal data., Statist. Med. 32, 2971–2987.
  • [16] Sun, J., Park, D-H., Sun, L. and Zhao, X. (2005). Semiparametric regression analysis of longitudinal data with informative observation times., J. Amer. Statist. Assoc. 100, 882–889.
  • [17] van der Vaart A. and Wellner, J. (1996)., Weak Convergence and Empirical Processes. New York: Springer.
  • [18] Verbeke, G. and Molenberghs, G. (2000)., Linear Mixed Models for Longitudinal Data. New York: Springer.
  • [19] Wohl, D, Zeng, D., Stewart, P., Glomb, N., Alcorn, T., Jones, S., Handy, J., Fiscus, S., Weinberg, A., Gowda, D. and van der Horst, C. (2005). Cytomegalovirus viremia, mortality and cmv end-organ disease among patients with AIDS receiving potent antiretroviral therapies., Journal of AIDS 38, 538–544.
  • [20] Xiong, X. and Dubin, J. A. (2010). A binning method for analyzing mixed longitudinal data measured at distinct time points., Statist. Med. 29, 1919–1931.