The Annals of Applied Statistics

Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples

Yajuan Si, Jerome P. Reiter, and D. Sunshine Hillygus

Full-text: Open access


Many panel studies collect refreshment samples—new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases caused by nonignorable attrition. We present such a model when the panel includes many categorical survey variables. The model relies on a Bayesian latent pattern mixture model, in which an indicator for attrition and the survey variables are modeled jointly via a latent class model. We allow the multinomial probabilities within classes to depend on the attrition indicator, which offers additional flexibility over standard applications of latent class models. We present results of simulation studies that illustrate the benefits of this flexibility. We apply the model to correct attrition bias in an analysis of data from the 2007–2008 Associated Press/Yahoo News election panel study.

Article information

Ann. Appl. Stat., Volume 10, Number 1 (2016), 118-143.

Received: December 2014
Revised: August 2015
First available in Project Euclid: 25 March 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Panel attrition refreshment sample categorical Dirichlet process multiple imputation nonignorable


Si, Yajuan; Reiter, Jerome P.; Hillygus, D. Sunshine. Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples. Ann. Appl. Stat. 10 (2016), no. 1, 118--143. doi:10.1214/15-AOAS876.

Export citation


  • Bartels, L. M. (1999). Panel effects in the American national election studies. Polit. Anal. 8 1–20.
  • Bartels, B. L., Box-Steffensmeier, J. M., Smidt, C. D. and Smith, R. M. (2011). The dynamic properties of individual-level party identification in the United States. Elect. Stud. 30 210–222.
  • Bayarri, M. J. and Berger, J. O. (1998). Quantifying surprise in the data and model verification. In Bayesian Statistics, 6 (Alcoceber, 1998) (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 53–82. Oxford Univ. Press, New York.
  • Behr, A., Bellgardt, E. and Rendtel, U. (2005). Extent and determinants of panel attrition in the European community household panel. Eur. Sociol. Rev. 21 489–512.
  • Bhattacharya, D. (2008). Inference in panel data models under attrition caused by unobservables. J. Econometrics 144 430–446.
  • Brehm, J. (1993). The Phantom Respondents. Univ. Michigan Press, Ann Arbor, MI.
  • Brown, C. H. (1990). Protecting against nonrandomly missing data in longitudinal studies. Biometrics 46 143–155.
  • Burden, B. C. and Hillygus, D. S. (2009). Polls and elections: Opinion formation, polarization, and presidential reelection. Pres. Stud. Q. 39 619–635.
  • Burgette, L. F. and Reiter, J. P. (2010). Multiple imputation for missing data via sequential regression trees. Am. J. Epidemiol. 172 1070–1076.
  • Daniels, M. J. and Hogan, J. W. (2000). Reparameterizing the pattern mixture model for sensitivity analyses under informative dropout. Biometrics 56 1241–1248.
  • Daniels, M. J. and Hogan, J. W. (2008). Missing Data in Longitudinal Studies. Chapman & Hall/CRC, Boca Raton, FL.
  • Das, M., Toepoel, V. and van Soest, A. (2011). Nonparametric tests of panel conditioning and attrition bias in panel surveys. Sociol. Methods Res. 40 32–56.
  • Deng, Y., Hillygus, D. S., Reiter, J. P., Si, Y. and Zheng, S. (2013). Handling attrition in longitudinal studies: The case for refreshment samples. Statist. Sci. 28 238–256.
  • Diggle, P. and Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis. J. R. Stat. Soc. Ser. C. Appl. Stat. 43 49–93.
  • Dunson, D. B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. J. Amer. Statist. Assoc. 104 1042–1051.
  • Erosheva, E. A., Fienberg, S. E. and Junker, B. W. (2002). Alternative statistical models and representations for large sparse multi-dimensional contingency tables. Ann. Fac. Sci. Toulouse Math. (6) 11 485–505.
  • Frankel, L. and Hillygus, D. S. (2013). Looking beyond demographics: Panel attrition in the ANES and GSS. Political Analysis 22 1–18.
  • Gelman, A. and Carlin, J. B. (2001). Poststratification and weighting adjustments. In Survey Nonresponse (R. Groves, D. Dillman, J. Eltinge and R. Little, eds.) Wiley, New York.
  • Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F. and Meulders, M. (2005). Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics 61 74–85.
  • Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 1360–1383.
  • Gilens, M. (2005). Inequality and democratic responsiveness. Public Opinion Quarterly 69 778–796.
  • Groves, R. M. (2006). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly 70 646–675.
  • Groves, R. M. and Couper, M. P. (1998). Nonresponse in Household Interview Surveys. Wiley, New York.
  • Hausman, J. A. and Wise, D. A. (1979). Attrition bias in experimental and panel data: The gary income maintenance experiment. Econometrica 47 455–473.
  • He, Y., Zaslavsky, A. M., Landrum, M. B., Harrington, D. P. and Catalano, P. (2010). Multiple imputation in a large-scale complex survey: A practical guide. Stat. Methods Med. Res. 19 653–670.
  • Henderson, M. and Hillygus, D. S. (2011). The dynamics of health care opinion, 2008–2010: Partisanship, self-interest, and racial resentment. Journal of Health Politics, Policy and Law 36 945–960.
  • Henderson, M., Hillygus, D. S. and Tompson, T. (2010). “Sour grapes” or rational voting? Voter decision making among thwarted primary voters in 2008. Public Opinion Quarterly. 74 499–529.
  • Hirano, K., Imbens, G. W., Ridder, G. and Rubin, D. B. (1998). Combining panel data sets with attrition and refreshment samples. Technical Report No. 230, National Bureau of Economic Research, New York.
  • Hirano, K., Imbens, G. W., Ridder, G. and Rubin, D. B. (2001). Combining panel data sets with attrition and refreshment samples. Econometrica 69 1645–1659.
  • Holt, D. and Smith, T. M. F. (1979). Post stratification. J. Roy. Statist. Soc. Ser. A 142 33–46.
  • Honaker, J. and King, G. (2010). What to do about missing values in time-series cross-section data. Am. J. Polit. Sci. 54 561–581.
  • Ibrahim, J. G., Lipsitz, S. R. and Chen, M.-H. (1999). Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 173–190.
  • Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
  • Iyengar, S., Sood, G. and Lelkes, Y. (2012). Affect, not ideology: A social identity perspective on polarization. Public Opin. Q. 76 405–431.
  • Kenward, M. G. (1998). Selection models for repeated measurements with non-random dropout: An illustration of sensitivity. Stat. Med. 17 2723–2732.
  • Kenward, M. G., Molenberghs, G. and Thijs, H. (2003). Pattern-mixture models with proper time dependence. Biometrika 90 53–71.
  • Kim, J. K., Brick, J. M., Fuller, W. A. and Kalton, G. (2006). On the bias of the multiple-imputation variance estimator in survey sampling. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 509–521.
  • King, G., Honaker, J., Joseph, A. and Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. Am. Polit. Sci. Rev. 95 49–69.
  • Kruse, Y., Callegaro, M., Dennis, J. M., Subias, S., Lawrence, M., DiSogra, C. and Tompson, T. (2009). Panel conditioning and attrition in the AP-yahoo! News election panel study. In 64th Conference of the American Association for Public Opinion Research. Hollywood, FL.
  • Lin, H., McCulloch, C. E. and Rosenheck, R. A. (2004). Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics 60 295–305.
  • Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. J. Amer. Statist. Assoc. 88 125–134.
  • Little, R. J. A. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika 81 471–483.
  • Lohr, S. L. (1999). Sampling: Design and Analysis. Duxbury Press, New York.
  • Loosveldt, G. and Carton, A. (1997). Evaluation of nonresponse in the Belgian election panel study 1991–1995. In JSM Proceedings, Section on Survey Research Methods 1017–1022. Amer. Statist. Assoc., Anaheim, CA.
  • Lumley, T. (2012). Survey: Analysis of complex survey samples. R package version 3.28-2.
  • Lynn, P. (2005). A review of methodological research pertinent to longitudinal survey design and data collection. Technical report, Institute for Social and Economic Research, University of Esses, United Kingdom.
  • Meng, X. L. (1994a). Multiple-imputation inferences with uncongenial sources of input (Disc: P558-573). Statist. Sci. 9 538–558.
  • Meng, X.-L. (1994b). Posterior predictive $p$-values. Ann. Statist. 22 1142–1160.
  • Muthén, B., Jo, B. and Brown, C. H. (2003). Principal stratification approach to broken randomized experiments: A case study of school choice vouchers in New York city [with comment]. J. Amer. Statist. Assoc. 98 311–314.
  • Olsen, R. J. (2005). The problem of respondent attrition: Survey methodology is key. Mon. Labor Rev. 128 63–71.
  • Olson, K. and Witt, L. (2011). Are we keeping the people who used to stay? Changes in correlates of panel survey attrition over time. Soc. Sci. Res. 40 1037–1050.
  • Pasek, J., Tahk, A., Lelkes, Y., Krosnick, J. A., Payne, B. K., Akhtar, O. and Tompson, T. (2009). Determinants of turnout and candidate choice in the 2008 US presidential election illuminating the impact of racial prejudice and other considerations. Public Opin. Q. 73 943–994.
  • Roy, J. (2003). Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics 59 829–836.
  • Roy, J. and Daniels, M. J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64 538–545, 668.
  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
  • Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096–1146.
  • Schifeling, T., Cheng, C., Reiter, J. P. and Hillygus, D. S. (2015). Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples. Journal of Survey Statistics and Methodology 3 265–295.
  • Schluchte, M. D. (1982). Methods for the analysis of informatively censored longitudinal data. Stat. Med. 11 1861–1870.
  • Sekhon, J. (2004). The varying role of voter information across democratic societies. Working paper, Univ. California, Berkeley, Berkeley, CA.
  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639–650.
  • Si, Y. and Reiter, J. P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. J. Educ. Behav. Stat. 38 499–521.
  • Si, Y., Reiter, J. P. and Hillygus, D. S. (2014). Semi-parametric selection models for potentially non-ignorable attrition in panel study with refreshment sample. Polit. Anal. 23 92–112.
  • Si, Y., Reiter, J. P. and Hillygus, D. S. (2016). Supplement to “Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples.” DOI:10.1214/15-AOAS876SUPP.
  • Smith, T. W. and Son, J. (2010). An analysis of panel attrition and panel change on the 2006–2008 general social survey panel, Technical report, GSS Methodological Report No. 118. Chicago: NORC.
  • Vermunt, J. K., Ginkel, J. R. V., der Ark, L. A. V. and Sijtsma, K. (2008). Multiple imputation of incomplete categorical data using latent class analysis. Sociol. Method. 38 369–397.

Supplemental materials

  • Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples. The supplement includes the MCMC algorithms for the BLPM and DPMPM models, additional analyses of the APYN data using the DPMPM model and semi-parametric AN model, and details of the BLPM model diagnostics.