The Annals of Applied Statistics

Development of a common patient assessment scale across the continuum of care: A nested multiple imputation approach

Chenyang Gu and Roee Gutman

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Evaluating and tracking patients’ functional status through the post-acute care continuum requires a common instrument. However, different post-acute service providers such as nursing homes, inpatient rehabilitation facilities and home health agencies rely on different instruments to evaluate patients’ functional status. These instruments assess similar functional status domains, but they comprise different activities, rating scales and scoring instructions. These differences hinder the comparison of patients’ assessments across health care settings. We propose a two-step procedure that combines nested multiple imputation with the multivariate ordinal probit (MVOP) model to obtain a common patient assessment scale across the post-acute care continuum. Our procedure imputes the unmeasured assessments at multiple assessment dates and enables evaluation and comparison of the rates of functional improvement experienced by patients treated in different health care settings using a common measure. To generate multiple imputations of the unmeasured assessments using the MVOP model, a likelihood-based approach that combines the EM algorithm and the bootstrap method as well as a fully Bayesian approach using the data augmentation algorithm are developed. Using a dataset on patients who suffered a stroke, we simulate missing assessments and compare the MVOP model to existing methods for imputing incomplete multivariate ordinal variables. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the MVOP model appears to be superior. The proposed procedure is then applied to patients who suffered a stroke and were released from rehabilitation facilities either to skilled nursing facilities or to their homes.

Article information

Ann. Appl. Stat., Volume 13, Number 1 (2019), 466-491.

Received: September 2017
Revised: July 2018
First available in Project Euclid: 10 April 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Data augmentation EM algorithm missing data nested multiple imputation multivariate ordinal probit model slice sampler


Gu, Chenyang; Gutman, Roee. Development of a common patient assessment scale across the continuum of care: A nested multiple imputation approach. Ann. Appl. Stat. 13 (2019), no. 1, 466--491. doi:10.1214/18-AOAS1202.

Export citation


  • Abayomi, K., Gelman, A. and Levy, M. (2008). Diagnostics for multivariate imputations. J. R. Stat. Soc. Ser. C. Appl. Stat. 57 273–291.
  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Andridge, R. R. Little, R. J. (2010). A review of hot deck imputation for survey non-response. Int. Stat. Rev. 78 40–64.
  • Ashford, J. R. and Sowden, R. R. (1970). Multi-variate probit analysis. Biometrics 26 535–546.
  • Burgette, L. F. and Reiter, J. P. (2010). Multiple imputation for missing data via sequential regression trees. Am. J. Epidemiol. 172 1070–1076.
  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., Riddell, A. et al. (2016). Stan: A probabilistic programming language. J. Stat. Softw. 20 1–37.
  • Chen, M.-H. and Dey, D. K. (2000). A unified Bayesian approach for analyzing correlated ordinal response data. Braz. J. Probab. Stat. 14 87–111.
  • Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika 85 347–361.
  • D’Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.
  • Damien, P. and Walker, S. G. (2001). Sampling truncated normal, beta, and gamma densities. J. Comput. Graph. Statist. 10 206–215.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Dorans, N. J., Pommerich, M. and Holland, P. W. (2007). Linking and Aligning Scores and Scales. Springer, New York.
  • Enders, C. K., Keller, B. T. and Levy, R. (2018). A fully conditional specification approach to multilevel imputation of categorical and continuous variables. Psychol. Methods 23 298–317.
  • Fischer, H. F., Tritt, K., Klapp, B. F. and Fliege, H. (2011). How to compare scores from different depression scales: Equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using Item Response Theory. Int. J. Methods Psychiatr. Res. 20 203–214.
  • Gage, B., Constantine, R., Aggarwal, J., Morley, M., Kurlantzick, V., Bernard, S. et al. (2012). The development and testing of the Continuity Assessment Record and Evaluation. (CARE) item set: Final report on the development of the CARE item set.
  • Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statist. Sinica 6 733–807.
  • Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.
  • Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F. and Meulders, M. (2005). Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics 61 74–85.
  • Genz, A. (1992). Numerical computation of multivariate normal probabilities. J. Comput. Graph. Statist. 1 141–149.
  • Geweke, J. (1991). Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints and the evaluation of constraint probabilities. In Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface 571–578. Interface Foundation of North America, Fairfax, VA.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Goodman, L. A. and Kruskal, W. H. (1979). Measures of Association for Cross Classifications. Springer Series in Statistics 1. Springer, New York.
  • Gu, C. and Gutman, R. (2017). Combining item response theory with multiple imputation to equate health assessment questionnaires. Biometrics 73 990–998.
  • Gu, C. and Gutman, R. (2019). Supplement to “Development of a common patient assessment scale across the continuum of care: A nested multiple imputation approach.” DOI:10.1214/18-AOAS1202SUPP.
  • Guerrero, V. M. and Johnson, R. A. (1982). Use of the Box–Cox transformation with binary response models. Biometrika 69 309–314.
  • Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2015). Graphical models for ordinal data. J. Comput. Graph. Statist. 24 183–204.
  • Harel, O. (2003). Strategies for data analysis with two types of missing values. PhD thesis, Pennsylvania State Univ., State College, PA.
  • He, Y. and Zaslavsky, A. M. (2012). Diagnosing imputation models by applying target analyses to posterior replicates of completed data. Stat. Med. 31 1–18.
  • Heitjan, D. F., Landis and Richard, J. (1994). Assessing secular trends in blood pressure: A multiple-imputation approach. J. Amer. Statist. Assoc. 89 750–759.
  • Hjort, N. L., Dahl, F. A. and Steinbakk, G. H. (2006). Post-processing posterior predictive $p$-values. J. Amer. Statist. Assoc. 101 1157–1174.
  • Holmes, C. C. and Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1 145–168.
  • Jeliazkov, I., Graves, J. and Kutzbach, M. (2008). Fitting and comparison of models for multivariate ordinal outcomes. Adv. Econom. 23 115–156.
  • Kolen, M. J. and Brennan, R. L. (2014). Test Equating, Scaling, and Linking: Methods and Practices, 3rd ed. Springer, New York.
  • Lawrence, E., Liu, C., Bingham, D. and Nair, V. N. (2008). Bayesian inference for multivariate ordinal data using parameter expansion. Technometrics 50 182–191.
  • Lewandowski, D., Kurowicka, D. and Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. J. Multivariate Anal. 100 1989–2001.
  • Li, Y. and Schafer, D. W. (2008). Likelihood analysis of the multivariate ordinal probit regression model for repeated ordinal responses. Comput. Statist. Data Anal. 52 3474–3492.
  • Li, C.-Y., Romero, S., Simpson, K. N., Bonilha, H. S., Simpson, A. N., Hong, I. and Velozo, C. A. (2017). Linking existing instruments to develop a continuum of care measure: Accuracy comparison using function-related group classification. Qual. Life Res. 1–10.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
  • Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85 755–770.
  • Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc. 94 1264–1274.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman & Hall, London.
  • Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80 267–278.
  • Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44 443–460.
  • Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108 1339–1349.
  • R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econom. Statist. 4 87–94.
  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
  • Rubin, D. B. (1994). Multiple imputation after 18$+$ years. J. Amer. Statist. Assoc. 91 473–489.
  • Rubin, D. B. (2003). Nested multiple imputation of NMES via partially incompatible MCMC. Stat. Neerl. 57 3–18.
  • Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Monographs on Statistics and Applied Probability 72. Chapman & Hall, London.
  • Shen, Z. (2000). Nested multiple imputations. PhD thesis, Harvard Univ., Cambridge, MA.
  • Si, Y. and Reiter, J. P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. J. Educ. Behav. Stat. 38 499–521.
  • Si, Y., Reiter, J. P. and Hillygus, D. S. (2016). Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples. Ann. Appl. Stat. 10 118–143.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–550.
  • ten Klooster, P. M., Voshaar, M. A. H. O., Gandek, B., Rose, M., Bjorner, J. B., Taal, E., Glas, C. A. W., van Riel, P. L. C. M. and van de Laar, M. A. F. J. (2013). Development and evaluation of a crosswalk between the SF-36 physical functioning scale and Health Assessment Questionnaire disability index in rheumatoid arthritis. Health Qual Life Outcomes 11 199.
  • van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16 219–242.
  • Varin, C. and Czado, C. (2010). A mixed autoregressive probit model for ordinal longitudinal data. Biostatistics 11 127–138.
  • Velozo, C. A., Byers, K. L. and Joseph, B. R. (2007). Translating measures across the continuum of care: Using Rasch analysis to create a crosswalk between the functional independence measure and the minimum data set. J. Rehabil. Res. Dev. 44 467.
  • Vermunt, J. K., Van Ginkel, J. R., Der Ark, V., Andries, L. and Sijtsma, K. (2008). Multiple imputation of incomplete categorical data using latent class analysis. Sociol. Method. 38 369–397.
  • von Davier, A. A., ed. (2011). Statistical Models for Test Equating, Scaling, and Linking. Statistics for Social and Behavioral Sciences. Springer, New York.
  • Wei, C. G. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Amer. Statist. Assoc. 85 699–704.
  • Wysocki, A., Thomas, K. S. and Mor, V. (2015). Functional improvement among short-stay nursing home residents in the MDS 3.0. J. Am. Med. Dir. Assoc. 16 470–474.
  • Yucel, R. M., He, Y. and Zaslavsky, A. M. (2011). Gaussian-based routines to impute categorical variables in health surveys. Stat. Med. 30 3447–3460.
  • Zhang, X., Boscardin, W. J. and Belin, T. R. (2006). Sampling correlation matrices in Bayesian models with correlated latent variables. J. Comput. Graph. Statist. 15 880–896.
  • Zhang, X., Li, Q., Cropsey, K., Yang, X., Zhang, K. and Belin, T. (2017). A multiple imputation method for incomplete correlated ordinal data using multivariate probit models. Comm. Statist. Simulation Comput. 46 2360–2375.

Supplemental materials

  • Supplement to “Development of a common patient assessment scale across the continuum of care: A nested multiple imputation approach”. The supplement includes the Slice Sampler Algorithm for the MVOP model, additional results in the simulation study of Section 4, results of posterior predictive checks in Section 5.3 and computer code for an example to illustrate the proposed procedure.