## The Annals of Applied Statistics

### A semiparametric modeling approach using Bayesian Additive Regression Trees with an application to evaluate heterogeneous treatment effects

#### Abstract

Bayesian Additive Regression Trees (BART) is a flexible machine learning algorithm capable of capturing nonlinearities between an outcome and covariates and interactions among covariates. We extend BART to a semiparametric regression framework in which the conditional expectation of an outcome is a function of treatment, its effect modifiers, and confounders. The confounders are allowed to have unspecified functional form, while treatment and effect modifiers that are directly related to the research question are given a linear form. The result is a Bayesian semiparametric linear regression model where the posterior distribution of the parameters of the linear part can be interpreted as in parametric Bayesian regression. This is useful in situations where a subset of the variables are of substantive interest and the others are nuisance variables that we would like to control for. An example of this occurs in causal modeling with the structural mean model (SMM). Under certain causal assumptions, our method can be used as a Bayesian SMM. Our methods are demonstrated with simulation studies and an application to dataset involving adults with HIV/Hepatitis C coinfection who newly initiate antiretroviral therapy. The methods are available in an R package called semibart.

#### Article information

Source
Ann. Appl. Stat., Volume 13, Number 3 (2019), 1989-2010.

Dates
Revised: May 2019
First available in Project Euclid: 17 October 2019

https://projecteuclid.org/euclid.aoas/1571277780

Digital Object Identifier
doi:10.1214/19-AOAS1266

Mathematical Reviews number (MathSciNet)
MR4019164

#### Citation

Zeldow, Bret; Lo Re III, Vincent; Roy, Jason. A semiparametric modeling approach using Bayesian Additive Regression Trees with an application to evaluate heterogeneous treatment effects. Ann. Appl. Stat. 13 (2019), no. 3, 1989--2010. doi:10.1214/19-AOAS1266. https://projecteuclid.org/euclid.aoas/1571277780

#### References

• Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
• Biller, C. (2000). Adaptive Bayesian regression splines in semiparametric generalized linear models. J. Comput. Graph. Statist. 9 122–140.
• Biller, C. and Fahrmeir, L. (2001). Bayesian varying-coefficient models using adaptive regression splines. Stat. Model. 1 195–211.
• Brezger, A. and Lang, S. (2006). Generalized structured additive regression based on Bayesian P-splines. Comput. Statist. Data Anal. 50 967–991.
• Centers for Disease Control and Prevention (2017). HIV and viral hepatitis. South Carolina State Documents Depository.
• Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment restrictions. J. Econometrics 34 305–334.
• Chipman, H. A., George, E. I. and McCulloch, R. E. (1998). Bayesian CART model search. J. Amer. Statist. Assoc. 93 935–948.
• Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 266–298.
• Chipman, H. and McCulloch, R. (2010). BayesTree: Bayesian methods for tree based models. R package version 0.3-1.1. Available at http://CRAN.R-project.org/package=BayesTree.
• Denison, D. G. T., Mallick, B. K. and Smith, A. F. M. (1998a). Automatic Bayesian curve fitting. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 333–350.
• Denison, D. G., Mallick, B. K. and Smith, A. F. (1998b). Bayesian mars. Stat. Comput. 8 337–346.
• Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with $B$-splines and penalties. Statist. Sci. 11 89–121.
• Friedman, J. H. (1991). Multivariate adaptive regression splines. Ann. Statist. 19 1–67.
• Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2013). Bayesian Data Analysis, 3rd ed. Texts in Statistical Science Series. CRC Press/CRC, Boca Raton, FL.
• Green, D. P. and Kern, H. L. (2012). Modeling heterogeneous treatment effects in survey experiments with Bayesian Additive Regression Trees. Public Opin. Q. 76 491–511.
• Günthard, H. F., Saag, M. S., Benson, C. A., Del Rio, C., Eron, J. J., Gallant, J. E., Hoy, J. F., Mugavero, M. J., Sax, P. E. et al. (2016). Antiretroviral drugs for treatment and prevention of HIV infection in adults: 2016 recommendations of the International Antiviral Society—USA panel. J. Amer. Medical Assoc. 316 191–210.
• Hahn, P. R., Murray, J. S. and Carvalho, C. M. (2018). Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects. Available at arXiv:1706.09523.
• Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. CRC Press, London.
• Hastie, T. and Tibshirani, R. (2000). Bayesian backfitting. Statist. Sci. 15 196–223.
• Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Statist. 20 217–240.
• Holmes, C. C. and Mallick, B. K. (2001). Bayesian regression with multivariate linear splines. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 3–17.
• Lo Re, V., Zeldow, B., Kallan, M. J., Tate, J. P., Carbonari, D. M., Hennessy, S., Kostman, J. R., Lim, J. K., Goetz, M. B. et al. (2017). Risk of liver decompensation with cumulative use of mitochondrial toxic nucleoside analogues in HIV/hepatitis C virus coinfection. Pharmacoepidemiol. Drug Saf. 26 1172–1181.
• Müller, P., Quintana, F. A., Jara, A. and Hanson, T. (2015). Bayesian Nonparametric Data Analysis. Springer Series in Statistics. Springer, Cham.
• National Institutes of Health (2018). Panel on antiretroviral guidelines for adults and adolescents. Guidelines for the Use of Antiretroviral Agents in Adults and Adolescents Living with HIV, Dept. Health and Human Services. Available at http://aidsinfo.nih.gov/contentfiles/lvguidelines/AdultandAdolescentGL.pdf. Accessed: 2019-03-01.
• Rasmussen, C. E. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
• Robins, J. M. (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Comm. Statist. Theory Methods 23 2379–2412.
• Robins, J. M. (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical Models in Epidemiology, the Environment, and Clinical Trials (Minneapolis, MN, 1997). IMA Vol. Math. Appl. 116 95–133. Springer, New York.
• Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 550–560.
• Saarela, O., Belzile, L. R. and Stephens, D. A. (2016). A Bayesian view of doubly robust causal inference. Biometrika 103 667–681.
• Scourfield, A., Jackson, A., Waters, L., Gazzard, B. and Nelson, M. (2011). The value of screening HIV-infected individuals for didanosine-related liver disease? Antivir. Ther. 16 941–942.
• Soriano, V., Puoti, M., Garcia-Gascó, P., Rockstroh, J. K., Benhamou, Y., Barreiro, P. and McGovern, B. (2008). Antiretroviral drugs and liver injury. AIDS 22 1–13.
• Sterling, R. K., Lissen, E., Clumeck, N., Sola, R., Correa, M. C., Montaner, J., Sulkowski, M. S., Torriani, F. J., Dieterich, D. T. et al. (2006). Development of a simple noninvasive index to predict significant fibrosis in patients with HIV/HCV coinfection. Hepatology 43 1317–1325.
• Vansteelandt, S. and Goetghebeur, E. (2003). Causal inference with generalized structural mean models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 817–835.
• Vansteelandt, S. and Joffe, M. (2014). Structural nested models and G-estimation: The partially realized promise. Statist. Sci. 29 707–731.
• van der Laan, M. J. and Rose, S. (2011). Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer, New York.
• Wood, S. and Wood, M. S. (2015). Package ‘mgcv.’ R package Version 1-7. Available at http://CRAN.R-project.org/package=mgcv.
• Zeldow, B., Lo Re III, V. and Roy, J. (2019). Supplement to “A semiparametric modeling approach using Bayesian Additive Regression Trees with an application to evaluate heterogeneous treatment effects.” DOI:10.1214/19-AOAS1266SUPP.

#### Supplemental materials

• Supplement A: R code for semi-BART manuscript. The supplement contains R code for the simulations, analysis code for our data application, and R code for some additional simulations performed.