## Bayesian Analysis

### High-Dimensional Confounding Adjustment Using Continuous Spike and Slab Priors

#### Abstract

In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders ($p$) is larger than the number of observations ($n$), then direct control for all potential confounders is infeasible. Existing approaches for dimension reduction and penalization are generally aimed at predicting the outcome, and are less suited for estimation of causal effects. Under standard penalization approaches (e.g. Lasso), if a variable $X_{j}$ is strongly associated with the treatment $T$ but weakly with the outcome $Y$, the coefficient $\beta_{j}$ will be shrunk towards zero thus leading to confounding bias. Under the assumption of a linear model for the outcome and sparsity, we propose continuous spike and slab priors on the regression coefficients $\beta_{j}$ corresponding to the potential confounders $X_{j}$. Specifically, we introduce a prior distribution that does not heavily shrink to zero the coefficients ($\beta_{j}$s) of the $X_{j}$s that are strongly associated with $T$ but weakly associated with $Y$. We compare our proposed approach to several state of the art methods proposed in the literature. Our proposed approach has the following features: 1) it reduces confounding bias in high dimensional settings; 2) it shrinks towards zero coefficients of instrumental variables; and 3) it achieves good coverages even in small sample sizes. We apply our approach to the National Health and Nutrition Examination Survey (NHANES) data to estimate the causal effects of persistent pesticide exposure on triglyceride levels.

#### Article information

Source
Bayesian Anal., Volume 14, Number 3 (2019), 805-828.

Dates
First available in Project Euclid: 11 June 2019

https://projecteuclid.org/euclid.ba/1560240029

Digital Object Identifier
doi:10.1214/18-BA1131

Mathematical Reviews number (MathSciNet)
MR3960772

Zentralblatt MATH identifier
07089627

#### Citation

Antonelli, Joseph; Parmigiani, Giovanni; Dominici, Francesca. High-Dimensional Confounding Adjustment Using Continuous Spike and Slab Priors. Bayesian Anal. 14 (2019), no. 3, 805--828. doi:10.1214/18-BA1131. https://projecteuclid.org/euclid.ba/1560240029

#### References

• Albert, J. H. and Chib, S. (1993). “Bayesian analysis of binary and polychotomous response data.” Journal of the American statistical Association, 88(422): 669–679.
• Antonelli, J., Cefalu, M., Palmer, N., and Agniel, D. (2018). “Doubly robust matching estimators for high dimensional confounding adjustment.” Biometrics.
• Antonelli, J., Parmigiani, G., and Dominici, F. (2018). “Supplementary materials for “High-dimensional confounding adjustment using continuous spike and slab priors”.” Bayesian Analysis.
• Antonelli, J., Zigler, C., and Dominici, F. (2017). “Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.” Biostatistics, 18(3): 553–568.
• Athey, S., Imbens, G. W., and Wager, S. (2016). “Approximate residual balancing: De-biased inference of average treatment effects in high dimensions.” arXiv preprint arXiv:1604.07125.
• Belloni, A., Chernozhukov, V., Fernández-Val, I., and Hansen, C. (2017). “Program Evaluation and Causal Inference With High-Dimensional Data.” Econometrica, 85(1): 233–298.
• Belloni, A., Chernozhukov, V., and Hansen, C. (2014). “Inference on treatment effects after selection among high-dimensional controls.” The Review of Economic Studies, 81(2): 608–650.
• Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet–Laplace priors for optimal shrinkage.” Journal of the American Statistical Association, 110(512): 1479–1490.
• Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). “The horseshoe estimator for sparse signals.” Biometrika, asq017.
• Casella, G. (2001). “Empirical Bayes Gibbs sampling.” Biostatistics, 2(4): 485–500.
• Cefalu, M., Dominici, F., Arvold, N., and Parmigiani, G. (2016). “Model averaged double robust estimation.” Biometrics.
• Crainiceanu, C. M., Dominici, F., and Parmigiani, G. (2008). “Adjustment uncertainty in effect estimation.” Biometrika, 95(3): 635–651.
• De Luna, X., Waernbaum, I., and Richardson, T. S. (2011). “Covariate selection for the nonparametric estimation of an average treatment effect.” Biometrika, asr041.
• Ertefaie, A., Asgharian, M., and Stephens, D. (2015). “Variable selection in causal inference using a simultaneous penalization method.” arXiv preprint arXiv:1511.08501.
• Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihood and its oracle properties.” Journal of the American statistical Association, 96(456): 1348–1360.
• Farrell, M. H. (2015). “Robust inference on average treatment effects with possibly more covariates than observations.” Journal of Econometrics, 189(1): 1–23.
• Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014). Bayesian data analysis, volume 2. Chapman & Hall/CRC Boca Raton, FL, USA.
• George, E. I. and McCulloch, R. E. (1993). “Variable selection via Gibbs sampling.” Journal of the American Statistical Association, 88(423): 881–889.
• Hahn, P. R., Carvalho, C., and Puelz, D. (2016). “Bayesian Regularized Regression for Treatment Effect Estimation from Observational Data.” Available at SSRN.
• Hahn, P. R., Murray, J. S., and Carvalho, C. (2017). “Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects.” arXiv preprint arXiv:1706.09523.
• Liao, S. and Zigler, C. (2018). “Uncertainty in the Design Stage of Two-Stage Bayesian Propensity Score Analysis.” arXiv preprint arXiv:1809.05038.
• Little, R. J. and Rubin, D. B. (2000). “Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches.” Annual review of public health, 21(1): 121–145.
• Lockhart, R., Taylor, J., Tibshirani, R. J., and Tibshirani, R. (2014). “A significance test for the lasso.” Annals of statistics, 42(2): 413.
• Louis, B., Germaine, M., and Sundaram, R. (2012). “Exposome: time for transformative research.” Statistics in medicine, 31(22): 2569–2575.
• Park, T. and Casella, G. (2008). “The bayesian lasso.” Journal of the American Statistical Association, 103(482): 681–686.
• Patel, C. J., Bhattacharya, J., and Butte, A. J. (2010). “An environment-wide association study (EWAS) on type 2 diabetes mellitus.” PloS one, 5(5): e10746.
• Patel, C. J., Cullen, M. R., Ioannidis, J. P., and Butte, A. J. (2012). “Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels.” International journal of epidemiology, 41(3): 828–843.
• Patel, C. J. and Ioannidis, J. P. (2014). “Studying the elusive environment in large scale.” Jama, 311(21): 2173–2174.
• Patel, C. J., Pho, N., McDuffie, M., Easton-Marks, J., Kothari, C., Kohane, I. S., and Avillach, P. (2016). “A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey.” Scientific data, 3.
• Pearl, J. (2011). “Invited commentary: understanding bias amplification.” American journal of epidemiology, 174(11): 1223–1227.
• Ročková, V. and George, E. I. (2016). “The spike-and-slab lasso.” Journal of the American Statistical Association, (just-accepted).
• Rosenbaum, P. R. and Rubin, D. B. (1983). “The central role of the propensity score in observational studies for causal effects.” Biometrika, 70(1): 41–55.
• Rubin, D. B. et al. (1981). “The bayesian bootstrap.” The annals of statistics, 9(1): 130–134.
• Rubin, D. B. et al. (2008). “For objective causal inference, design trumps analysis.” The Annals of Applied Statistics, 2(3): 808–840.
• Scott, J. G., Berger, J. O., et al. (2010). “Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem.” The Annals of Statistics, 38(5): 2587–2619.
• Shortreed, S. M. and Ertefaie, A. (2017). “Outcome-adaptive lasso: Variable selection for causal inference.” Biometrics.
• Talbot, D., Lefebvre, G., and Atherton, J. (2015). “The Bayesian causal effect estimation algorithm.” Journal of Causal Inference, 3(2): 207–236.
• Taylor, J. and Tibshirani, R. (2016). “Post-selection inference for l1-penalized likelihood models.” arXiv preprint arXiv:1602.07358.
• Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
• van der Laan, M. J. and Gruber, S. (2010). “Collaborative double robust targeted maximum likelihood estimation.” The international journal of biostatistics, 6(1).
• Vansteelandt, S., Bekaert, M., and Claeskens, G. (2012). “On model selection and model misspecification in causal inference.” Statistical methods in medical research, 21(1): 7–30.
• Wang, C., Dominici, F., Parmigiani, G., and Zigler, C. M. (2015). “Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models.” Biometrics, 71(3): 654–665.
• Wang, C., Parmigiani, G., and Dominici, F. (2012). “Bayesian effect estimation accounting for adjustment uncertainty.” Biometrics, 68(3): 661–671.
• Watanabe, S. (2010). “Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.” Journal of Machine Learning Research, 11(Dec): 3571–3594.
• Wild, C. P. (2005). “Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology.” Cancer Epidemiology Biomarkers & Prevention, 14(8): 1847–1850.
• Wilson, A. and Reich, B. J. (2014). “Confounder selection via penalized credible regions.” Biometrics, 70(4): 852–861.
• Zhou, J., Bhattacharya, A., Herring, A. H., and Dunson, D. B. (2015). “Bayesian factorizations of big sparse tensors.” Journal of the American Statistical Association, 110(512): 1562–1576. URL http://www.tandfonline.com/doi/abs/10.1080/01621459.2014.983233#.VNQ2p1WUd5k.
• Zigler, C. M. and Dominici, F. (2014). “Uncertainty in propensity score estimation: Bayesian methods for variable selection and model-averaged causal effects.” Journal of the American Statistical Association, 109(505): 95–107.
• Zou, H. (2006). “The adaptive lasso and its oracle properties.” Journal of the American statistical association, 101(476): 1418–1429.
• Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.
• Zubizarreta, J. R. (2015). “Stable weights that balance covariates for estimation with incomplete outcome data.” Journal of the American Statistical Association, 110(511): 910–922.

#### Supplemental materials

• Supplementary materials for “High-dimensional confounding adjustment using continuous spike and slab priors”. Here we give additional details and derivations for estimation of the empirical Bayes variance and posterior calculation. We further illustrate the estimation of the posterior mode of our model and give additional simulation results. An R package implementing the approach for both binary and continuous outcomes is available at github.com/jantonelli111/HDconfounding.