Bayesian Analysis

Bayesian Zero-Inflated Negative Binomial Regression Based on Pólya-Gamma Mixtures

Brian Neelon

Full-text: Open access

Abstract

Motivated by a study examining spatiotemporal patterns in inpatient hospitalizations, we propose an efficient Bayesian approach for fitting zero-inflated negative binomial models. To facilitate posterior sampling, we introduce a set of latent variables that are represented as scale mixtures of normals, where the precision terms follow independent Pólya-Gamma distributions. Conditional on the latent variables, inference proceeds via straightforward Gibbs sampling. For fixed-effects models, our approach is comparable to existing methods. However, our model can accommodate more complex data structures, including multivariate and spatiotemporal data, settings in which current approaches often fail due to computational challenges. Using simulation studies, we highlight key features of the method and compare its performance to other estimation procedures. We apply the approach to a spatiotemporal analysis examining the number of annual inpatient admissions among United States veterans with type 2 diabetes.

Article information

Source
Bayesian Anal., Volume 14, Number 3 (2019), 829-855.

Dates
First available in Project Euclid: 11 June 2019

Permanent link to this document
https://projecteuclid.org/euclid.ba/1560240030

Digital Object Identifier
doi:10.1214/18-BA1132

Zentralblatt MATH identifier
07089628

Keywords
zero inflation zero-inflated negative binomial Pólya-Gamma distribution data augmentation spatiotemporal data

Rights
Creative Commons Attribution 4.0 International License.

Citation

Neelon, Brian. Bayesian Zero-Inflated Negative Binomial Regression Based on Pólya-Gamma Mixtures. Bayesian Anal. 14 (2019), no. 3, 829--855. doi:10.1214/18-BA1132. https://projecteuclid.org/euclid.ba/1560240030


Export citation

References

  • Banerjee, S. (2017). “High-Dimensional Bayesian Geostatistics.” Bayesian Analysis, 12(2): 583–614.
  • Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2014). Hierarchical Modeling and Analysis for Spatial Data. Boca Raton: Chapman & Hall/CRC, second edition.
  • Celeux, G., Forbes, F., Robert, C. P., and Titterington, D. M. (2006). “Deviance information criteria for missing data models.” Bayesian Analysis, (4): 651–673.
  • Chuan-Fen, L., Bryson, C. L., Burgess, J. F., Sharp, N., Perkins, M., and Maciejewski, M. (2012). “Use of outpatient care in VA and Medicare among disability-eligible and age-eligible veteran patients.” BMC Health Services Research, 12(51).
  • Consul, P. (1989). Generalized Poisson Distributions: Properties and Applications. New York: Marcel Dekker.
  • Dadaneh, S. Z., Zhou, M., and Qian, X. (2018). “Bayesian negative binomial regression for differential expression with confounding factors.” Bioinformatics, 34(19): 3349–3356.
  • Fisher, E. S., Wennberg, J. E., Stukel, T. A., Skinner, J. S., Sharp, S. M., Freeman, J. L., and Gittelsohn, A. M. (2000). “Associations among hospital capacity, utilization, and mortality of US Medicare beneficiaries, controlling for sociodemographic factors.” BMC Health Services Research, 34(6): 1351.
  • Flegal, J. M., Hughes, J., Vats, D., and Dai, N. (2017). mcmcse: Monte Carlo Standard Errors for MCMC. Riverside, CA, Denver, CO, Coventry, UK, and Minneapolis, MN. R package version 1.3-2.
  • Furrer, R. and Sain, S. (2010). “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields.” Journal of Statistical Software, Articles, 36(10): 1–25.
  • Gelman, A., Hwang, J., and Vehtari, A. (2014). “Understanding Predictive Information Criteria for Bayesian Models.” Statistics and Computing, 24(6): 997–1016.
  • Gerber, F. and Furrer, R. (2015). “Pitfalls in the implementation of Bayesian hierarchical modeling of areal count data: An illustration using BYM and Leroux Models.” Journal of Statistical Software, Code Snippets, 63(1): 1–32.
  • Geweke, J. (1992). “Evaluating the accuracy of sampling-based approaches to calculating posterior moments.” In Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M. (eds.), Bayesian Statistics 4, 169–193. Oxford: Clarendon Press.
  • Ghosh, S. K., Mukhopadhyay, P., and Lu, J.-C. (2006). “Bayesian analysis of zero-inflated regression models.” Journal of Statistical Planning and Inference, 136(4): 1360–1375.
  • Health Economic Resource Center (2017). “Inpatient Average Cost Data Table, 2000–2016.” Technical report, US Department of Veterans Affiars, Washington, DC.
  • Hodges, J. S. and Reich, B. J. (2010). “Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love.” The American Statistician, 64(4): 325–334.
  • Kaboli, P., Go, J., Hockenberry, J., and et al. (2012). “Associations between reduced hospital length of stay and 30-day readmission rate and mortality: 14-year experience in 129 veterans affairs hospitals.” Annals of Internal Medicine, 157(12): 837–845.
  • Lambert, D. (1992). “Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.” Technometrics, 34(1): 1–14.
  • Lunn, D., Jackson, C., Best, N., Thomas, A., and Spiegelhalter, D. (2014). The BUGS Book: A practical introduction to Bayesian analysis. Boca Raton: Chapman & Hall/CRC.
  • Neelon, B. (2018). “Supplementary material for “Bayesian Zero-Inflated Negative Binomial Regression Based on Pólya-Gamma Mixtures””. Bayesian Analysis.
  • Neelon, B. H., O’Malley, A. J., and Normand, S.-L. T. (2010). “A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use.” Statistical Modelling, 10(4): 421–439.
  • Pillow, J. and Scott, J. (2012). “Fully Bayesian inference for neural models with negative-binomial spiking.” In Bartlett, P., Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems 25, 1907–1915. MIT Press.
  • Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). “CODA: Convergence Diagnosis and Output Analysis for MCMC.” R News, 6(1): 7–11. URL https://journal.r-project.org/archive/
  • Polson, N. G., Scott, J. G., and Windle, J. (2013a). “Bayesian inference for logistic models using Pólya-Gamma latent variables.” Journal of the American Statistical Association, 108(504): 1339–1349.
  • Polson, N. G., Scott, J. G., and Windle, J. (2013b). “Bayesian inference for logistic models using Pólya-Gamma latent variables.” Most recent version: Feb. 2013. URL http://arxiv.org/abs/1205.0310
  • Quan, H., Sundararajan, V., Halfon, P., Fong, A., Burnand, B., Luthi, J.-C., Duncan Saunders, L., Beck, C., Feasby, T., and A Ghali, W. (2005). “Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data.” Medical care, 43: 1130–1139.
  • Su, L., Tom, B. D. M., and Farewell, V. T. (2009). “Bias in 2-part mixed models for longitudinal semicontinuous data.” Biostatistics, 10(2): 374–389.
  • U.S. Census Bureau (2014). “TIGER/Line Shapefiles.” Suitland, MD.
  • Watanabe, S. (2010). “Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.” Journal of Machine Learning Research, 11: 3571–3594.
  • Zhou, M. and Carin, L. (2015). “Negative Binomial Process Count and Mixture Modeling.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37: 307–320.
  • Zurr, A. F., Saveliev, A. A., and Ieno, E. N. (2012). Zero Inflated Models and Generalized Linear Mixed Models with R. Newburgh: Highland Statistics Ltd.

Supplemental materials

  • Supplementary material for “Bayesian Zero-Inflated Negative Binomial Regression Based on Pólya-Gamma Mixtures”. This supplement contains derivations of the full conditionals discussed in Section 2 (Appendices A and B), additional tables and figures for the simulation studies presented in Section 3 (Appendix C), and additional tables and figures for case study presented in Section 4 (Appendix D).