Bayesian Analysis

Bayesian Mixture Models with Focused Clustering for Mixed Ordinal and Nominal Data

Maria DeYoreo, Jerome P. Reiter, and D. Sunshine Hillygus

Full-text: Open access

Abstract

In some contexts, mixture models can fit certain variables well at the expense of others in ways beyond the analyst’s control. For example, when the data include some variables with non-trivial amounts of missing values, the mixture model may fit the marginal distributions of the nearly and fully complete variables at the expense of the variables with high fractions of missing data. Motivated by this setting, we present a mixture model for mixed ordinal and nominal data that splits variables into two groups, focus variables and remainder variables. The model allows the analyst to specify a rich sub-model for the focus variables and a simpler sub-model for remainder variables, yet still capture associations among the variables. Using simulations, we illustrate advantages and limitations of focused clustering compared to mixture models that do not distinguish variables. We apply the model to handle missing values in an analysis of the 2012 American National Election Study, estimating relationships among voting behavior, ideology, and political party affiliation.

Article information

Source
Bayesian Anal., Volume 12, Number 3 (2017), 679-703.

Dates
First available in Project Euclid: 17 August 2016

Permanent link to this document
https://projecteuclid.org/euclid.ba/1471454533

Digital Object Identifier
doi:10.1214/16-BA1020

Mathematical Reviews number (MathSciNet)
MR3655872

Zentralblatt MATH identifier
1384.62192

Keywords
categorical missing mixture model multiple imputation

Rights
Creative Commons Attribution 4.0 International License.

Citation

DeYoreo, Maria; Reiter, Jerome P.; Hillygus, D. Sunshine. Bayesian Mixture Models with Focused Clustering for Mixed Ordinal and Nominal Data. Bayesian Anal. 12 (2017), no. 3, 679--703. doi:10.1214/16-BA1020. https://projecteuclid.org/euclid.ba/1471454533


Export citation

References

  • Abayomi, K., Gelman, A., and Levy, M. (2008). “Diagnostics for multivariate imputations.” Journal of the Royal Statistical Society: Series C (Applied Statistics), 57(3): 273–291.
  • Albert, J. and Chib, S. (1993). “Bayesian analysis of binary and polychotomous response data.” Journal of the American Statistical Association, 88: 669–679.
  • Banerjee, A., Murray, J., and Dunson, D. (2013). “Bayesian learning of joint distributions of objects.” In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics.
  • Bao, J. and Hanson, T. (2015). “Bayesian nonparametric multivariate ordinal regression.” Canadian Journal of Statistics, 43: 337–357.
  • Bartles, L. M. (1999). “Panel effects in the American national election studies.” Political Analysis, 8: 1–20.
  • Berinsky, A. J. (2004). Silent Voices: Public Opinion and Political Participation in America. Princeton University Press.
  • Bishop, Y., Fienberg, S., and Holland, P. (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: M.I.T. Press.
  • Boes, S. and Winkelmann, R. (2006). Ordered Response Models, 167–181. Springer, Berlin, Heidelberg.
  • Böhning, D., Seidel, W., Alfo, M., Garel, B., Patilea, V., Walther, G., Zio, M. D., Guarnzera, U., and Luzi, O. (2007). “Imputation through finite Gaussian mixture models.” Computational Statistics and Data Analysis, 51: 5305–5316.
  • Canale, A. and Dunson, D. (2015). “Bayesian multivariate mixed-scale density estimation.” Statistics and Its Interface, 8: 195–201.
  • Chib, S. and Greenberg, E. (1998). “Analysis of multivariate probit models.” Biometrika, 85: 347–361.
  • DeYoreo, M. and Kottas, A. (2014). “Bayesian nonparametric modeling for multivariate ordinal regression.” arXiv:1408.1027, stat.ME.
  • DeYoreo, M. and Kottas, A. (2015). “A fully nonparametric modeling approach to binary regression.” Bayesian Analysis, 10: 821–847.
  • DeYoreo, M., Reiter, J. P., and Hillygus, D. S. (2016). “Supplementary material for “Bayesian mixture models with focused clustering for mixed ordinal and nominal data”.” Bayesian Analysis.
  • Dunson, D. and Bhattacharya, A. (2010). “Nonparametric Bayes regression and classification through mixtures of product kernels.” In Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 9, Proceedings of Ninth Valencia International Conference on Bayesian Statistics, 145–164.
  • Dunson, D. and Xing, C. (2009). “Nonparametric Bayes modeling of multivariate categorical data.” Journal of the American Statistical Association, 104: 1042–1051.
  • Elliott, M. and Stettler, N. (2007). “Using a mixture model for multiple imputation in the presence of outliers: the Healthy for Life project.” Journal of the Royal Statistical Society: Series C, 56: 63–78.
  • Gelman, A., Van Mechelen, I., Verbeke, G., and Meulders, H. (2005). “Multiple imputation for model checking: completed-data plots with missing and latent data.” Biometrics, 61: 74–85.
  • Ghahramani, Z. and Hinton, G. (1997). “The EM algorithm for mixtures of factor analyzers.” Technical report, University of Toronto.
  • Gorur, D. and Rasmussen, C. (2009). “Nonparametric mixtures of factor analyzers.” Sigma Processing and Communications Applications Conference, 708–711.
  • Hannah, L., Blei, D., and Powell, W. (2011). “Dirichlet process mixtures of generalized linear models.” Journal of Machine Learning Research, 1: 1–33.
  • He, Y. and Zaslavsky, A. (2012). “Diagnosing imputation models by applying target analyses to posterior replicates of completed data.” Statistics in Medicine, 31: 1–18.
  • Honaker, J., King, G., and Blackwell, M. (2011). “Amelia II: A program for missing data.” Journal of Statistical Software, 45(7): 1–47.
  • Ibrahim, J., Lipsitz, S., and Chen, M. (1999). “Missing covariates in generalized linear models when the missing data mechanism is non-ignorable.” Journal of the Royal Statistical Society, Series B, 61: 173–190.
  • Kim, H. J., Cox, L., Karr, A., Reiter, J., and Wang, Q. (2015). “Simultaneous edit-imputation for continuous microdata.” Journal of the American Statistical Association, 110: 987–999.
  • Kim, H. J., Reiter, J. P., Wang, Q., Cox, L., and Karr, A. (2014). “Multiple imputation of missing or faulty values under linear constraints.” Journal of Business and Economic Statistics, 32: 375–386.
  • Kottas, A., Müller, P., and Quintana, F. (2005). “Nonparametric Bayesian modelling for multivariate ordinal data.” Journal of Computational and Graphical Statistics, 14: 610–625.
  • Lipsitz, S. and Ibrahim, J. (1996). “A conditional model for incomplete covariates in parametric regression models.” Biometrika, 83: 916–922.
  • Little, R. and Rubin, D. (2002). Statistical Analysis with Missing Data. New York: Wiley.
  • Manrique-Vallier, D. and Reiter, J. (2014). “Bayesian multiple imputation for large-scale categorical data with structural zeros.” Survey Methodology, 40: 125–134.
  • McParland, D., Gormley, I., McCormick, T., Clark, S., Whiteson, K., and Collinson, M. (2014). “Clustering South African households based on their asset status using latent variable models.” Annals of Applied Statistics, 8: 747–776.
  • Müller, P. and Mitra, R. (2013). “Bayesian nonparametric inference: Why and how?” Bayesian Analysis, 8: 269–302.
  • Murray, J. and Reiter, J. (2016). “Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence.” Journal of the American Statistical Association, to appear.
  • Norets, A. and Pelenis, J. (2012). “Bayesian modeling of joint and conditional distributions.” Journal of Econometrics, 168: 332–346.
  • Pasek, J., Tahk, A., Lelkes, Y., Krosnick, J. A., Payne, B. K., Akhtar, O., and Tompson, T. (2009). “Determinants of turnout and candidate choice in the 2008 U.S. presidential election illuminating the impact of racial prejudice and other considerations.” Public Opinion Quarterly, 73(5): 943–994.
  • Peress, M. (2010). “Correcting for survey nonresponse using variable response propensity.” Journal of the American Statistical Association, 105: 1418–1430.
  • Petralia, F., Rao, V., and Dunson, D. (2012). “Repulsive mixtures.” Advances in Neural Information Processing Systems, 25.
  • Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., and Solenberger, P. (2001). “A multivariate technique for multiply imputing missing values using a sequence of regression models.” Survey Methodology, 27(1): 85–96.
  • Reiter, J. P. and Raghunathan, T. E. (2007). “The multiple adaptations of multiple imputation.” Journal of the American Statistical Association, 102: 1462–1471.
  • Rubin, D. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
  • Rubin, D. (1996). “Multiple imputation after 18+ years.” Journal of the American Statistical Association, 91: 473–489.
  • Si, Y. and Reiter, J. (2013). “Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys.” Journal of Educational and Behavioral Statistics, 38: 499–521.
  • Treier, S. and Hillygus, D. (2009). “The nature of political ideology in the contemporary electorate.” Public Opinion Quarterly, 73: 679–703.
  • van Buuren, S. and Groothuis-Oudshoorn, K. (2011). “Mice: Multivariate imputation by chained equations.” Journal of Statistical Software, 45(3): 1–67.
  • Vermunt, J., Ginkel, J., der Ark, L., and Sijtsma, K. (2008). “Multiple imputation of incomplete categorical data using latent class analysis.” Sociological Methodology, 38: 369–397.
  • Wade, S., Dunson, D., Perone, S., and Trippa, L. (2014a). “Improving prediction from Dirichlet process mixtures via enrichment.” Journal of Machine Learning Research, 15: 1041–1071.
  • Wade, S., Walker, S. G., and Petrone, S. (2014b). “A predictive study of Dirichlet process mixture models for curve fitting.” Scandinavian Journal of Statistics, 41: 580–605.

Supplemental materials