Open Access
May 2019 Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review
Lei Liu, Ya-Chen Tina Shih, Robert L. Strawderman, Daowen Zhang, Bankole A. Johnson, Haitao Chai
Statist. Sci. 34(2): 253-279 (May 2019). DOI: 10.1214/18-STS681


Zero-inflated nonnegative continuous (or semicontinuous) data arise frequently in biomedical, economical, and ecological studies. Examples include substance abuse, medical costs, medical care utilization, biomarkers (e.g., CD4 cell counts, coronary artery calcium scores), single cell gene expression rates, and (relative) abundance of microbiome. Such data are often characterized by the presence of a large portion of zero values and positive continuous values that are skewed to the right and heteroscedastic. Both of these features suggest that no simple parametric distribution may be suitable for modeling such type of outcomes. In this paper, we review statistical methods for analyzing zero-inflated nonnegative outcome data. We will start with the cross-sectional setting, discussing ways to separate zero and positive values and introducing flexible models to characterize right skewness and heteroscedasticity in the positive values. We will then present models of correlated zero-inflated nonnegative continuous data, using random effects to tackle the correlation on repeated measures from the same subject and that across different parts of the model. We will also discuss expansion to related topics, for example, zero-inflated count and survival data, nonlinear covariate effects, and joint models of longitudinal zero-inflated nonnegative continuous data and survival. Finally, we will present applications to three real datasets (i.e., microbiome, medical costs, and alcohol drinking) to illustrate these methods. Example code will be provided to facilitate applications of these methods.


Download Citation

Lei Liu. Ya-Chen Tina Shih. Robert L. Strawderman. Daowen Zhang. Bankole A. Johnson. Haitao Chai. "Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review." Statist. Sci. 34 (2) 253 - 279, May 2019.


Published: May 2019
First available in Project Euclid: 19 July 2019

zbMATH: 07110696
MathSciNet: MR3983328
Digital Object Identifier: 10.1214/18-STS681

Keywords: cure rate , frailty model , health econometrics , joint model , semiparametric regression , splines , Tobit model , two-part model

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.34 • No. 2 • May 2019
Back to Top