June 2024 Flexible instrumental variable models with Bayesian additive regression trees
Charles Spanbauer, Wei Pan
Author Affiliations +
Ann. Appl. Stat. 18(2): 1471-1489 (June 2024). DOI: 10.1214/23-AOAS1843


Methods utilizing instrumental variables have been a fundamental statistical approach to causal estimation in the presence of unmeasured confounding, usually occurring in nonrandomized observational data common to fields such as economics and public health. However, such methods traditionally make constricting linearity and additivity assumptions that are inapplicable to the complex modeling challenges of today. The growing body of observational data being collected may benefit from flexible regression modeling while also retaining the ability to control for confounding using instrumental variables. Therefore, this article presents a flexible instrumental variable regression model based on Bayesian regression tree ensembles to estimate the causal exposure-outcome relationship, including interactions with covariates, in the presence of confounding. One exciting application of this method is to use genetic variants as instruments, known as Mendelian randomization. We present our flexible Bayesian instrumental variable regression tree method with an example from the UK Biobank where body mass index is related to blood pressure using genetic variants as the instruments. Body mass index is one factor that is hypothesized to have a nonlinear relationship with cardiovascular risk factors, such as blood pressure, while interacting with age. Heterogeneity in patient characteristics, such as age, could be clinically interesting from a precision medicine perspective where individualized treatment is emphasized.

Funding Statement

This work was supported by the National Institutes of Health grant R01HL116720.


The authors would first like to acknowledge the Minnesota Supercomputing Institute at the University of Minnesota (http://www.msi.umn.edu) for providing resources that contributed to the research results reported within this paper. We would also like to acknowledge the UK Biobank for providing the data used in the analysis section. The application number to access the UK Biobank data is #35107. Finally, we would like to thank the peer-reviewers and editors who offered helpful comments about the direction of this work.


Download Citation

Charles Spanbauer. Wei Pan. "Flexible instrumental variable models with Bayesian additive regression trees." Ann. Appl. Stat. 18 (2) 1471 - 1489, June 2024. https://doi.org/10.1214/23-AOAS1843


Received: 1 May 2022; Revised: 1 July 2023; Published: June 2024
First available in Project Euclid: 5 April 2024

Digital Object Identifier: 10.1214/23-AOAS1843

Keywords: causality , Genetics , instrumental variables , machine learning , Mendelian randomization

Rights: Copyright © 2024 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.18 • No. 2 • June 2024
Back to Top