The Annals of Applied Statistics

Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of health care

D. Fouskakis, I. Ntzoufras, and D. Draper

Source: Ann. Appl. Stat. Volume 3, Number 2 (2009), 663-690.

Abstract

In the field of quality of health care measurement, one approach to assessing patient sickness at admission involves a logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing classical variable selection methods to find an “optimal” subset of 10–20 indicators. Such “benefit-only” methods ignore the considerable differences among the sickness indicators in cost of data collection, an issue that is crucial when admission sickness is used to drive programs (now implemented or under consideration in several countries, including the U.S. and U.K.) that attempt to identify substandard hospitals by comparing observed and expected mortality rates (given admission sickness). When both data-collection cost and accuracy of prediction of 30-day mortality are considered, a large variable-selection problem arises in which costly variables that do not predict well enough should be omitted from the final scale.

In this paper (a) we develop a method for solving this problem based on posterior model odds, arising from a prior distribution that (1) accounts for the cost of each variable and (2) results in a set of posterior model probabilities that corresponds to a generalized cost-adjusted version of the Bayesian information criterion (BIC), and (b) we compare this method with a decision-theoretic cost-benefit approach based on maximizing expected utility. We use reversible-jump Markov chain Monte Carlo (RJMCMC) methods to search the model space, and we check the stability of our findings with two variants of the MCMC model composition (MC3) algorithm. We find substantial agreement between the decision-theoretic and cost-adjusted-BIC methods; the latter provides a principled approach to performing a cost-benefit trade-off that avoids ambiguities in identification of an appropriate utility structure. Our cost-benefit approach results in a set of models with a noticeable reduction in cost and dimensionality, and only a minor decrease in predictive performance, when compared with models arising from benefit-only analyses.

Related Works:

Keywords: Input-output analysis; quality of health care; sickness at hospital admission; cost-benefit analysis; Laplace approximation; reversible-jump Markov chain Monte Carlo (RJMCMC) methods; MCMC model composition (MC^3); Bayesian information criterion (BIC); cost-adjusted BIC

Full-text: Access denied (no subscription detected)

In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1245676190
Digital Object Identifier: doi:10.1214/08-AOAS207
Zentralblatt MATH identifier: 1166.62082

References

Barbieri, M. D. and Berger, J. O. (2004). Optimal predictive model selection., Ann. Statist. 32 870–897.
Mathematical Reviews (MathSciNet): MR2065192
Zentralblatt MATH: 1092.62033
Digital Object Identifier: doi:10.1214/009053604000000238
Project Euclid: euclid.aos/1085408489
Bartlett, M. S. (1957). Comment on D. V. Lindley’s statistical paradox., Biometrika 44 533–534.
Mathematical Reviews (MathSciNet): MR86727
Zentralblatt MATH: 0080.36301
Bernardo, J. M. and Smith, A. F. M. (1994)., Bayesian Theory. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1274699
Brown, P. J., Vannucci, M. and Fearn, T. (1998). Multivariate Bayesian variable selection and prediction., J. Roy. Statist. Soc. Ser. B 60 627–641.
Mathematical Reviews (MathSciNet): MR1626005
Zentralblatt MATH: 0909.62022
Digital Object Identifier: doi:10.1111/1467-9868.00144
CalNOC (2008). The California nursing outcomes coalition database project. Available at, www.calnoc.org.
Chen, M. H., Ibrahim, J. G. and Shao, Q. M. (2000). Power prior distributions for generalized linear models., J. Statist. Plann. Inference 84 121–137.
Mathematical Reviews (MathSciNet): MR1747500
Zentralblatt MATH: 0971.62036
Digital Object Identifier: doi:10.1016/S0378-3758(99)00140-8
Chipman, H., George, E. I. and McCulloch, R. E. (2001). The practical implementation of Bayesian model selection (with discussion). In, Model Selection. IMS Lecture Notes Monogr. Ser. 38 67–134. Institute of Mathematical Statistics, Beachwood, OH.
Mathematical Reviews (MathSciNet): MR2000752
Digital Object Identifier: doi:10.1214/lnms/1215540964
CMS (2008). Centers for Medicare & Medicaid services: Medicare information resource. Available at, www.cms.hhs.gov.
Dellaportas, P., Forster, J. J. and Ntzoufras, I. (2002). On Bayesian model and variable selection using MCMC., Statist. Comput. 12 27–36.
Dempster, A. P. (1974). The direct use of likelihood for significance testing. In, Proceedings of a Conference on Foundational Questions in Statistical Inference (O. Barndorff-Nielsen, P. Blaesild and G. Sihon, eds.) 335–352. Univ. Aarhus, Aarhus. [Reprinted: Statist. Comput. 7 (1997) 247–252].
Mathematical Reviews (MathSciNet): MR408052
Zentralblatt MATH: 0367.62004
Donabedian, A. and Bashshur, R. (2002)., An Introduction to Quality Assurance in Health Care. Oxford Univ. Press, Oxford.
Draper, D. (1995). Inference and hierarchical modeling in the social sciences (with discussion)., Journal of Educational and Behavioral Statistics 20 115–147, 233–239.
Draper, D. and Fouskakis, D. (2000). A case study of stochastic optimization in health policy: Problem formulation and preliminary results., Journal of Global Optimization 18 399–416.
Draper, D. and Krnjajić, M. (2009). Bayesian model specification. Unpublished, manuscript.
Fouskakis, D. (2001). Stochastic optimisation methods for cost-effective quality assessment in health. Ph.D. dissertation, Dept. Mathematical Sciences, Univ. Bath, UK. Available at, http://www.math.ntua.gr/~fouskakis.
Fouskakis, D. and Draper, D. (2002). Stochastic optimization: A review., Int. Statist. Rev. 70 315–349.
Fouskakis, D. and Draper, D. (2008). Comparing stochastic optimization methods for variable selection in binary outcome prediction, with application to health policy., J. Amer. Statist. Assoc. 103 1367–1381.
Fouskakis, D., Ntzoufras, I. and Draper, D. (2009a). Supplement to “Bayesian variable selection using cost-adjusted BIC with application to cost-effective measurement of quality health care.”, DOI:10.1214/08-AOAS207SUPP.
Fouskakis, D., Ntzoufras, I. and Draper, D. (2009b). Population-based reversible jump MCMC for Bayesian variable selection and evaluation under a cost constraint., J. Roy. Statist. Soc. Ser. C 58 383–403.
Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection., J. Amer. Statist. Assoc. 74 153–160.
Mathematical Reviews (MathSciNet): MR529531
Digital Object Identifier: doi:10.2307/2286745
Gelfand, A. E. (1996). Model determination using sampling-based methods. In, Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 145–162. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1397966
Gelfand, A. E., Dey, D. K. and Chang, H. (1992). Model determination using predictive distributions, with implementation via sampling-based methods (with discussion). In, Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 147–167. Oxford Univ. Press, Oxford.
Mathematical Reviews (MathSciNet): MR1380275
George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling., J. Amer. Statist. Assoc. 88 881–889.
Goldstein, H. and Spiegelhalter, D. J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance (with discussion)., J. Roy. Statist. Soc. Ser. A 159 385–444.
Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination., Biometrika 82 711–732.
Mathematical Reviews (MathSciNet): MR1380810
Zentralblatt MATH: 0861.62023
Digital Object Identifier: doi:10.1093/biomet/82.4.711
Han, C. and Carlin, B. (2001). MCMC methods for computing Bayes factors: A comparative review., J. Amer. Statist. Assoc. 96 1122–1132.
Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinski, C. T. (1999). Bayesian model averaging: A tutorial., Statist. Sci. 14 382–417.
Mathematical Reviews (MathSciNet): MR1765176
Digital Object Identifier: doi:10.1214/ss/1009212519
Project Euclid: euclid.ss/1009212519
Kahn, K., Rogers, W., Rubenstein, L., Sherwood, M., Reinisch, E., Keeler, E., Draper, D., Kosecoff, J. and Brook, R. (1990a). Measuring quality of care with explicit process criteria before and after implementation of the DRG-based Prospective Payment System., J. Amer. Med. Assoc. 264 1969–1973 (with editorial comment, 1995–1997).
Kahn, K., Rubenstein, L., Draper, D., Kosecoff, J., Rogers, W., Keeler, E. and Brook, R. (1990b). The effects of the DRG-based Prospective Payment System on quality of care for hospitalized Medicare patients: An introduction to the series., J. Amer. Med. Assoc. 264 1953–1955 (with editorial comment, 1995–1997).
Kass, R. E. and Raftery, A. E. (1995). Bayes factors., J. Amer. Statist. Assoc. 90 773–795.
Kass, R. E. and Wasserman, L. (1996). The selection of prior distributions by formal rules., J. Amer. Statist. Assoc. 91 1343–1370.
Mathematical Reviews (MathSciNet): MR1478684
Digital Object Identifier: doi:10.1214/lnms/1215453065
Keeler, E., Kahn, K., Draper, D., Sherwood, M., Rubenstein, L., Reinisch, E., Kosecoff, J. and Brook, R. (1990). Changes in sickness at admission following the introduction of the Prospective Payment System., J. Amer. Med. Assoc. 264 1962–1968.
Kuo, L. and Mallick, B. (1998). Variable selection for regression models., Sankhyā Ser. B 60 65–81.
Mathematical Reviews (MathSciNet): MR1717076
Lindley, D. V. (1957). A statistical paradox., Biometrika 44 187–192.
Mathematical Reviews (MathSciNet): MR87273
Zentralblatt MATH: 0084.35806
Lindley, D. V. (1968). The choice of variables in multiple regression (with discussion)., J. Roy. Statist. Soc. Ser. B 30 31–66.
Mathematical Reviews (MathSciNet): MR231492
Lopes, H. F. (2002). Bayesian model selection. Technical report, Dept. Métodos Estatísticos, Univ. Federal do Rio de Janeiro,, Brazil.
Madigan, D. and York, J. (1995). Bayesian graphical models for discrete data., Int. Statist. Rev. 63 215–232.
McCullagh, P. and Nelder, J. A. (1983)., Generalized Linear Models. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR727836
NDNQI (2008). National database of nursing quality indicators. Available at www.nursingquality., org.
Ntzoufras, I. (1999). Aspects of Bayesian model and variable selection using MCMC. Ph.D. thesis, Department of Statistics, Athens University of Economics and Business. Available at, www.stat-athens.aueb.gr/~jbn/publications.htm.
Ntzoufras, I., Dellaportas, P. and Forster, J. J. (2003). Bayesian variable and link determination for generalized linear models., J. Statist. Plann. Inference 111 165–180.
Mathematical Reviews (MathSciNet): MR1955879
Zentralblatt MATH: 1033.62026
Digital Object Identifier: doi:10.1016/S0378-3758(02)00298-7
Ohlssen, D. I., Sharples, L. D. and Spiegelhalter, D. J. (2007). A hierarchical modelling framework for identifying unusual performance in health care providers., J. Roy. Statist. Soc. Ser. A 170 865–890.
Mathematical Reviews (MathSciNet): MR2408982
Digital Object Identifier: doi:10.1111/j.1467-985X.2007.00487.x
Raftery, A. E. (1995). Bayesian model selection in social research. In, Sociological Methodology 1995 (P. V. Marsden, ed.) 25 111–196. Blackwell, Oxford.
Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalized linear models., Biometrika 83 251–266.
Mathematical Reviews (MathSciNet): MR1439782
Zentralblatt MATH: 0864.62049
Digital Object Identifier: doi:10.1093/biomet/83.2.251
Robert, C. P. (1993). A note on the Jeffreys–Lindley paradox., Statist. Sinica 3 601–608.
Mathematical Reviews (MathSciNet): MR1243404
Zentralblatt MATH: 0823.62006
Schuster, M. A., McGlynn, E. A. and Brook, R. H. (2005). How good is the quality of health care in the United States?, Milbank Quarterly 83 843–895.
Schwarz, G. (1978). Estimating the dimension of a model., Ann. Statist. 6 461–464.
Mathematical Reviews (MathSciNet): MR468014
Zentralblatt MATH: 0379.62005
Digital Object Identifier: doi:10.1214/aos/1176344136
Project Euclid: euclid.aos/1176344136
Shafer, J. (1982). Lindley’s paradox (with discussion)., J. Amer. Statist. Assoc. 77 325–334.
Sinharay, S. and Stern, H. S. (2002). On the sensitivity of Bayes factors to the prior distributions., Amer. Statist. 56 196–201.
Mathematical Reviews (MathSciNet): MR1940207
Digital Object Identifier: doi:10.1198/000313002137
Spiegelhalter, D. J., Best, N., Carlin, B. and van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion)., J. Roy. Statist. Soc. Ser. B 64 583–639.
Mathematical Reviews (MathSciNet): MR1979380
Zentralblatt MATH: 1067.62010
Digital Object Identifier: doi:10.1111/1467-9868.00353
Spiegelhalter, D. J., Thomas, A., Best, N. and Gilks, W. (1996). BUGS 0.5: Bayesian inference using Gibbs sampling. Available at, www.mrc-bsu.cam.ac.uk/bugs.
Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities., J. Amer. Statist. Assoc. 81 82–86.
Mathematical Reviews (MathSciNet): MR830567
Zentralblatt MATH: 0587.62067
Digital Object Identifier: doi:10.2307/2287970
Zhang, M., Strawderman, R. L., Cowen, M. E. and Wells, M. T. (2006). Bayesian inference for a two-part hierarchical model: An application to profiling providers in managed health care., J. Amer. Statist. Assoc. 101 934–945.
Mathematical Reviews (MathSciNet): MR2324094
Zentralblatt MATH: 1120.62308
Digital Object Identifier: doi:10.1198/016214505000001429

2009 © Institute of Mathematical Statistics