Bayesian Analysis

A Dirichlet Process Mixture Model for Non-Ignorable Dropout

Camille M. Moore, Nichole E. Carlson, Samantha MaWhinney, and Sarah Kreidler

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access


Longitudinal cohorts are a valuable resource for studying HIV disease progression; however, dropout is common in these studies. Subjects often fail to return for visits due to disease progression, loss to follow-up, or death. When dropout depends on unobserved outcomes, data are missing not at random, and results from standard longitudinal data analyses can be biased. Several methods have been proposed to adjust for non-ignorable dropout; however, many of these approaches rely on parametric assumptions about the distribution of dropout times and the functional form of the relationship between the outcome and dropout time. More flexible approaches may be needed when the distribution of dropout times does not follow a known distribution or violates proportional hazards assumptions, or when the relationship between the outcome and dropout times does not have a simple polynomial form. We propose a Bayesian semi-parametric Dirichlet process mixture model to flexibly model the relationship between dropout time and the outcome and show that more accurate inference can be obtained by non-parametrically modeling the distribution of subject-specific effects as well as the distribution of dropout times. Results from simulation studies as well as an application to a longitudinal HIV cohort study database illustrate the strengths of our Bayesian semi-parametric approach.

Article information

Bayesian Anal., Advance publication (2018), 29 pages.

First available in Project Euclid: 30 October 2019

Permanent link to this document

Digital Object Identifier

Dirichlet process mixture model missing data dropout MCMC

Creative Commons Attribution 4.0 International License.


Moore, Camille M.; Carlson, Nichole E.; MaWhinney, Samantha; Kreidler, Sarah. A Dirichlet Process Mixture Model for Non-Ignorable Dropout. Bayesian Anal., advance publication, 30 October 2019. doi:10.1214/19-BA1181.

Export citation


  • Albert, P. and Follmann, D. (2000). “Modeling repeated count data subject to informative dropout.” Biometrics, 56: 667–677.
  • Bacon, M. C., von Wyl, V., Alden, C., Sharp, G., Robison, E., Hessol, N., Gange, S., Barranday, Y., Holman, S., Weber, K., and Young, M. A. (2005). “The Women’s Interagency HIV Study: an observational cohort brings clinical sciences to the bench.” Clinical and Diagnostic Laboratory Immunology, 12(9): 1013–1019. URL
  • Barkan, S. E., Melnick, S. L., Preston-Martin, S., Weber, K., Kalish, L. A., Miotti, P., Young, M., Greenblatt, R., Sacks, H., and Feldman, J. (1998). “The Women’s Interagency HIV Study. WIHS Collaborative Study Group.” Epidemiology, 9(2): 117–125.
  • Blei, D. and Jordan, M. (2006). “Variational inference for Dirichlet process mixtures.” Bayesian Analysis, 1: 121–144.
  • Celeux, G., Hurn, M., and Robert, C. P. (2000). “Computational and inferential difficulties with mixture posterior distributions.” Journal of the American Statistical Association, 95(451): 957–970.
  • Centers for Disease Control and Prevention (2002). “HIV/AIDS Surveillance Report.”
  • Christensen, R., Johnson, W., Branscum, A., and Hanson, T. (2011). Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. CRC Press.
  • Daniels, M. and Hogan, J. (2008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Boca Raton: Chapman and Hall/CRC.
  • Diggle, P., Liang, K., and Zeger, S. (1994). Analysis of Longitudinal Data. Clarendon Press: Oxford.
  • Dunson, D. (2010). “Nonparametric Bayes applications to biostatistics.” In Hjort, N., Holmes, C., Muller, P., and Walker, S. (eds.), Bayesian Nonparametrics. Cambridge: Cambridge University Press.
  • Ekholm, A. and Skinner, C. (1998). “The Muscatine children’s obesity data reanalysed using pattern mixture models.” Journal of Applied Statistics, 47: 251–263.
  • Escobar, M. and West, M. (1995). “Bayesian density estimation and inference using mixtures.” Journal of the American Statistical Association, 90: 577–588.
  • Fairclough, D., Peterson, H., and Chang, V. (1998). “Why are missing quality of life data a problem in clinical trials of cancer therapy?” Statistics in Medicine, 17: 667–677.
  • Fitzmaurice, G., Laird, N., and Shneyer, L. (2001). “An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable drop-outs.” Statistics in Medicine, 20: 1009–1021.
  • Follmann, D. and Wu, M. (1995). “An approximate generalized linear model with random effects for informative missing data.” Biometrics, 51(1): 151–168.
  • Forster, J., MaWhinney, S., Ball, E., and Fairclough, D. (2012). “A varying-coefficient method for analyzing longitudinal clinical trials data with nonignorable dropout.” Contemporary Clinical Trials, 33: 378–385.
  • Forster, J., MaWhinney, S., and Wang, X. (2013). “A natural b-spline varying-coefficient method for longitudinal binary response data with nonignorable dropout.” American Statistical Association, Joint Statistical Meetings. August, 2013. Montreal CA.
  • Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., and Hothorn, T. (2014). mvtnorm: Multivariate Normal and t Distributions. R package version 1.0-2. URL
  • Ghosal, S. (2010). “The Dirichlet process, related priors and posterior asymptotics.” In Hjort, N., Holmes, C., Muller, P., and Walker, S. (eds.), Bayesian Nonparametrics. Cambridge: Cambridge University Press.
  • Gray, K., Hampton, B., Silveti-Falls, T., McConnell, A., and Bausell, C. (2015). “Comparison of Bayesian Credible Intervals to Frequentist Confidence Intervals.” Journal of Modern Applied Statistical Methods, 14(1): 8.
  • Heckman, J. (1979). “Sample selection bias as a specification error.” Econometrica, 47: 153–161.
  • Heckman, J. (1998). “Selection models for repeated measurements with non-random dropout: an illustration of sensitivity.” Statistics in Medicine, 17: 2723–2732.
  • Hogan, J., Lin, X., and Herman, B. (2004a). “Mixtures of varying-coefficient models for longitudinal data with discrete or continuous nonignorable dropout.” Biometrics, 60: 854–864.
  • Hogan, J., Roy, J., and Korkontzelou, C. (2004b). “Tutorial in biostatistics, handling drop-out in longitudinal studies.” Statistics in Medicine, 23: 1455–1497.
  • Ibrahim, J., Chen, M., and Lipsitz, S. (2001). “Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable.” Biometrika, 88: 551–564.
  • Ibrahim, J. and Molenberghs, G. (2009). “Missing data methods in longitudinal studies: a review.” Test, 18: 1–43.
  • Ishwaran, H. and James, L. (2001). “Gibbs Sampling Methods for Stick-Breaking Priors.” Journal of the American Statistical Association, 96: 161–173.
  • Ishwaran, H. and Takahara, G. (2002). “Independent and identically distributed Monte Carlo algorithms for semiparametric linear mixed models.” Journal of the American Statistical Association, 97: 1154–1166.
  • Jain, S. and Neal, R. (2004). “A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model.” Journal of Computational and Graphical Statistics, 13: 158–182.
  • Kaciroti, N., Raghunathan, T., Schork, M., and Clark, N. (2008). “A Bayesian model for longitudinal count data with non-ignorable dropout.” Journal of the Royal Statistical Society (Series C) Applied Statistics, 57: 521–534.
  • Kaciroti, N., Raghunathan, T., Taylor, J., and Julius, S. (2012). “A Bayesian model for time-to-event data with informative censoring.” Biostatistics, 13: 341–354.
  • Kaciroti, N., Schork, M., Raghunathan, T., and Julius, S. (2009). “A Bayesian sensitivity model for intention-to-treat analysis on binary outcomes with dropouts.” Statistics in Medicine, 28: 572–585.
  • Lancaster, T. and Intrator, O. (1998). “Panel data with survival: hospitalization of HIV-positive patients.” Journal of the American Statistical Association, 93: 46–53.
  • Lanoya, E., Mary-Krausea, M., Tattevinb, P., Dray-Spirac, R., Duvivierd, C., Fischere, P., Obadiaf, Y., and Lert, F. (2006). “Predictors identified for losses to follow-up among HIV-seropositive patients.” Journal of Clinical Epidemiology, 59: 829–835.
  • Linero, A. and Daniels, M. (2015). “A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies with Nonignorable Missingness with Application to an Acute Schizophrenia Clinical Trial.” Journal of the American Statistical Association, 110: 45–55.
  • Linero, A. R. (2017). “Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness.” Biometrika, 104(2): 327–341.
  • Little, J. and Rubin, D. (2002). Statistical Analysis with Missing Data, Second Edition. Hoboken, New Jersey: John Wiley & Sons, Inc.
  • Little, R. (1993). “Pattern-mixture models for multivariate incomplete data.” Journal of the American Statistical Association, 88: 125–134.
  • MacEachern, S. (1994). “Estimating normal means with a conjugate style Dirichlet process prior.” Communications in Statistics: Simulation and Computation, 23: 727–741.
  • MacEachern, S., Clyde, M., and Liu, J. (1999). “Sequential importance sampling for nonparametric Bayes models: The next generation.” Canadian Journal of Statistics, 27: 251–267.
  • Makov, D., Smith, A., and Titterington, D. (1985). “Statistical Analysis of Finite Mixture Distributions.” Chichester-New York: J. Willey & Sons.
  • Martin, A. D., Quinn, K. M., and Park, J. H. (2011). “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software, 42(9): 22. URL
  • Molenberghs, G., Kenward, M., and Lesaffre, E. (1997). “The analysis of longitudinal ordinal data with non-random dropout.” Biometrika, 84: 33–44.
  • Molenberghs, G., Thijs, H., Jansen, I., Beunckens, C., Kenward, M., Mallinckrodt, C., and Carroll, R. (2004). “Analyzing incomplete longitudinal clinical trial data.” Biostatistics, 5: 445–464.
  • Moore, C. (2013). “Knot Selection Strategies for Semiparametric Varying-Coefficient Models Applied to Longitudinal Cohorts with Multiple Dropout Reasons.” Master’s thesis, Colorado School of Public Health, University of Colorado Denver.
  • Moore, C., MaWhinney, S., Forster, J., Carlson, N., Allshouse, A., Wang, X., Routy, J., Conway, B., and Connick, E. (2017). “Accounting for dropout reason in longitudinal studies with nonignorable dropout.” Statistical Methods in Medical Research, 1854–1866.
  • Moore, C., Carlson, N., MaWhinney, S., and Kreidler, S. (2019). “Supplementary Material: A Dirichlet Process Mixture Model for Non-Ignorable Dropout.” Bayesian Analysis.
  • Newton, M. and Zhang, Y. (1999). “A recursive algorithm for nonparametric analysis with missing data.” Biometrika, 86: 15–26.
  • Pauler, D., McCoy, S., and Moinpour, C. (2003). “Pattern mixture models for longitudinal quality of life studies in advanced stage disease.” Statistics in Medicine, 22: 795–809.
  • Rubin, D. (1977). “Formalizing subjective notions about the effect of nonrespondents in sample surveys.” Journal of the American Statistical Association, 72: 538–543.
  • Rubin, D. (1981). “The Bayesian bootstrap.” Annals of Statistics, 9: 130–134.
  • Schluchter, M. (1992). “Methods for the analysis of informatively censored longitudinal data.” Statistics in Medicine, 11: 1861–1870.
  • Schluchter, M., Greene, T., and Beck, J. (2001). “Analysis of change in the presence of informative censoring: application to a longitudinal clinical trial of progressive renal disease.” Statistics in Medicine, 20: 989–1007.
  • Sethuraman, J. (1994). “A constructive definition of Dirichlet priors.” Statistica Sinica, 4: 639–650.
  • Shi, Y., Martens, M., Banerjee, A., Laud, P., et al. (2019). “Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models.” Bayesian Analysis, 14(3): 677–702.
  • Stephens, M. (2000). “Dealing with label switching in mixture models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4): 795–809.
  • Su, L. and Hogan, J. W. (2008). “Bayesian semiparametric regression for longitudinal binary processes with missing data.” Statistics in Medicine, 27(17): 3247–3268.
  • Su, L. and Hogan, J. W. (2010). “Varying-coefficient models for longitudinal processes with continuous-time informative dropout.” Biostatistics, 11(1): 93–110.
  • Ten Have, T., Kunselman, A., Pulkstenis, E., and Landis, J. (1998). “Mixed effects logistic regression models for longitudinal binary response data with informative drop-out.” Biometrics, 54: 367–383.
  • Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York: Springer, fourth edition. ISBN 0-387-95457-0. URL
  • Warnes, G. R., Bolker, B., and Lumley, T. (2014). gtools: Various R programming tools. R package version 3.4.1. URL
  • Wasserman, L. (2011). “Frasian inference.” Statistical Science, 322–325.
  • Wei, G. and Tanner, M. (1990). “Posterior computations for censored regression data.” Journal of the American Statistical Association, 85: 829–839.
  • Wu, K. and Wu, L. (2007). “Generalized linear mixed models with informative dropouts and missing covariates.” Metrika, 66: 1–18.
  • Wu, M. and Bailey, K. (1988). “Analysing changes in the presence of informative right censoring caused by death and withdrawal.” Statistics in Medicine, 7: 337–346.
  • Wu, M. and Bailey, K. (1989). “Estimation and comparison of changes in the presence of informative right censoring; Conditional linear model.” Biometrics, 45: 939–955.

Supplemental materials