Statistical Science

Conflict Diagnostics in Directed Acyclic Graphs, with Applications in Bayesian Evidence Synthesis

Anne M. Presanis, David Ohlssen, David J. Spiegelhalter, and Daniela De Angelis

Full-text: Open access


Complex stochastic models represented by directed acyclic graphs (DAGs) are increasingly employed to synthesise multiple, imperfect and disparate sources of evidence, to estimate quantities that are difficult to measure directly. The various data sources are dependent on shared parameters and hence have the potential to conflict with each other, as well as with the model. In a Bayesian framework, the model consists of three components: the prior distribution, the assumed form of the likelihood and structural assumptions. Any of these components may be incompatible with the observed data. The detection and quantification of such conflict and of data sources that are inconsistent with each other is therefore a crucial component of the model criticism process. We first review Bayesian model criticism, with a focus on conflict detection, before describing a general diagnostic for detecting and quantifying conflict between the evidence in different partitions of a DAG. The diagnostic is a $p$-value based on splitting the information contributing to inference about a “separator” node or group of nodes into two independent groups and testing whether the two groups result in the same inference about the separator node(s). We illustrate the method with three comprehensive examples: an evidence synthesis to estimate HIV prevalence; an evidence synthesis to estimate influenza case-severity; and a hierarchical growth model for rat weights.

Article information

Statist. Sci., Volume 28, Number 3 (2013), 376-397.

First available in Project Euclid: 28 August 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Conflict directed acyclic graph evidence synthesis graphical model model criticism


Presanis, Anne M.; Ohlssen, David; Spiegelhalter, David J.; De Angelis, Daniela. Conflict Diagnostics in Directed Acyclic Graphs, with Applications in Bayesian Evidence Synthesis. Statist. Sci. 28 (2013), no. 3, 376--397. doi:10.1214/13-STS426.

Export citation


  • Ades, A. E. and Cliffe, S. (2002). Markov chain Monte Carlo estimation of a multiparameter decision model: Consistency of evidence and the accurate assessment of uncertainty. Medical Decision Making 22 359–371.
  • Ades, A. E. and Sutton, A. J. (2006). Multiparameter evidence synthesis in epidemiology and medical decision-making: Current approaches. J. Roy. Statist. Soc. Ser. A 169 5–35.
  • Andrade, J. A. A. and O’Hagan, A. (2006). Bayesian robustness modeling using regularly varying distributions. Bayesian Anal. 1 169–188.
  • Bayarri, M. J. and Berger, J. O. (1999). Quantifying surprise in the data and model verification. In Bayesian Statistics, 6 (Alcoceber, 1998) (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 53–82. Oxford Univ. Press, New York.
  • Bayarri, M. J. and Berger, J. O. (2000). $p$ values for composite null models. J. Amer. Statist. Assoc. 95 1127–1142, 1157–1170.
  • Bayarri, M. J. and Castellanos, M. E. (2007). Bayesian checking of the second levels of hierarchical models. Statist. Sci. 22 322–343.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289–300.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • Birrell, P. J., Ketsetzis, G., Gay, N. J., Cooper, B. S., Presanis, A. M., Harris, R. J., Charlett, A., Zhang, X.-S., White, P. J., Pebody, R. G. and De Angelis, D. (2011). Bayesian modeling to unmask and predict influenza A/H1N1pdm dynamics in London. Proc. Natl. Acad. Sci. USA 108 18238–18243.
  • Bousquet, N. (2008). Diagnostics of prior-data agreement in applied Bayesian analysis. J. Appl. Stat. 35 1011–1029.
  • Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness (with discussion). J. Roy. Statist. Soc. Ser. A 143 383–430.
  • Box, G. E. P. and Tiao, G. C. (1992). Bayesian Inference in Statistical Analysis. Wiley, New York.
  • Bretz, F., Hothorn, T. and Westfall, P. (2011). Multiple Comparisons Using R, 1st ed. Chapman & Hall/CRC, London.
  • Clark, J. S., Bell, D., Chu, C., Courbaud, B., Dietze, M., Hersh, M., HilleRisLambers, J., Ibáñez, I., LaDeau, S., McMahon, S., Metcalf, J., Mohan, J., Moran, E., Pangle, L., Pearson, S., Salk, C., Shen, Z., Valle, D. and Wyckoff, P. (2010). High-dimensional coexistence based on individual variation: A synthesis of evidence. Ecological Monographs 80 569–608.
  • Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Springer, New York.
  • Dahl, F. A., Gåsemyr, J. and Natvig, B. (2007). A robust conflict measure of inconsistencies in Bayesian hierarchical models. Scand. J. Stat. 34 816–828.
  • Dawid, A. P. (1984). Present position and potential developments: Some personal views: Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser. A 147 278–292.
  • Dempster, A. P. (1997). The direct use of likelihood for significance testing. Statist. Comput. 7 247–252.
  • Dias, S., Welton, N. J., Caldwell, D. M. and Ades, A. E. (2010). Checking consistency in mixed treatment comparison meta-analysis. Stat. Med. 29 932–944.
  • DuMouchel, W. H. and Harris, J. E. (1983). Bayes methods for combining the results of cancer studies in humans and other species. J. Amer. Statist. Assoc. 78 293–315.
  • Evans, M. (1997). Bayesian inference procedures derived via the concept of relative surprise. Comm. Statist. Theory Methods 26 1125–1143.
  • Evans, M. and Jang, G. H. (2010). Invariant $P$-values for model checking. Ann. Statist. 38 512–525.
  • Evans, M. and Jang, G. H. (2011). Weak informativity and the information in one prior relative to another. Statist. Sci. 26 423–439.
  • Evans, M. and Moshonov, H. (2006). Checking for prior-data conflict. Bayesian Anal. 1 893–914 (electronic).
  • Evans, M. and Moshonov, H. (2007). Checking for prior-data conflict with hierarchically specified priors. In Bayesian Statistics and Its Applications (A. K. Upadhyay, U. Singh and D. Dey, eds.) 145–159. Anamaya Publishers, New Delhi.
  • Gåsemyr, J. and Natvig, B. (2009). Extensions of a conflict measure of inconsistencies in Bayesian hierarchical models. Scand. J. Stat. 36 822–838.
  • Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. J. Amer. Statist. Assoc. 85 972–985.
  • Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statist. Sinica 6 733–807.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2003). Bayesian Data Analysis, 2nd ed. Chapman & Hall/CRC, London.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Greenland, S. (2009). Relaxation penalties and priors for plausible modeling of nonidentified bias sources. Statist. Sci. 24 195–210.
  • Henderson, D. A., Boys, R. J. and Wilkinson, D. J. (2010). Bayesian calibration of a stochastic kinetic computer model using multiple data sources. Biometrics 66 249–256.
  • Higgins, J. P. T., Jackson, D., Barrett, J. K., Lu, G., Ades, A. E. and White, I. R. (2012). Consistency and inconsistency in network meta-analysis: Concepts and models for multi-arm studies. Research Synthesis Methods 3 98–110.
  • Hjort, N. L., Dahl, F. A. and Steinbakk, G. H. (2006). Post-processing posterior predictive $p$-values. J. Amer. Statist. Assoc. 101 1157–1174.
  • Hothorn, T., Bretz, F. and Westfall, P. (2008). Simultaneous inference in general parametric models. Biom. J. 50 346–363.
  • Jackson, C., Richardson, S. and Best, N. (2008). Studying place effects on health by synthesising individual and area-level outcomes. Social Science & Medicine 67 1995–2006.
  • Jackson, D., White, I. R. and Carpenter, J. (2012). Identifying influential observations in Bayesian models by using Markov chain Monte Carlo. Stat. Med. 31 1238–1248.
  • Johnson, V. E. (2007). Bayesian model assessment using pivotal quantities. Bayesian Anal. 2 719–733.
  • Jones, H. E., Ohlssen, D. I. and Spiegelhalter, D. J. (2008). Use of the false discovery rate when comparing multiple health care providers. J. Clin. Epidemiol. 61 232–240.
  • Kass, R. E. (1990). Data-translated likelihood and Jeffreys’s rules. Biometrika 77 107–114.
  • Langford, I. H. and Lewis, T. (1998). Outliers in multilevel data. J. Roy. Statist. Soc. Ser. A 161 121–160.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
  • Lu, G. and Ades, A. E. (2006). Assessing evidence inconsistency in mixed treatment comparisons. J. Amer. Statist. Assoc. 101 447–459.
  • Lunn, D., Spiegelhalter, D., Thomas, A. and Best, N. (2009). The BUGS project: Evolution, critique and future directions. Stat. Med. 28 3049–3067.
  • Marshall, E. C. and Spiegelhalter, D. J. (2007). Identifying outliers in Bayesian hierarchical models: A simulation-based approach. Bayesian Anal. 2 409–444.
  • O’Hagan, A. (2003). HSSS model criticism (with discussion). In Highly Structured Stochastic Systems, 1st ed. (P. J. Green, N. L. Hjort and S. Richardson, eds.). Oxford Univ. Press, New York.
  • Ohlssen, D. I., Sharples, L. D. and Spiegelhalter, D. J. (2007). A hierarchical modelling framework for identifying unusual performance in health care providers. J. Roy. Statist. Soc. Ser. A 170 865–890.
  • Presanis, A. M., De Angelis, D., Spiegelhalter, D. J., Seaman, S., Goubar, A. and Ades, A. E. (2008). Conflicting evidence in a Bayesian synthesis of surveillance data to estimate human immunodeficiency virus prevalence. J. Roy. Statist. Soc. Ser. A 171 915–937.
  • Presanis, A. M., Pebody, R. G., Paterson, B. J., Tom, B. D. M., Birrell, P. J., Charlett, A., Lipsitch, M. and De Angelis, D. (2011). Changes in severity of 2009 pandemic A/H1N1 influenza in England: A Bayesian evidence synthesis. BMJ 343 d5408+.
  • R Development Core Team. (2005). R: A Language and Environment for Statistical Computing. Vienna, Austria.
  • Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $p$ values in composite null models. J. Amer. Statist. Assoc. 95 1143–1167, 1171–1172.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Scheel, I., Green, P. J. and Rougier, J. C. (2011). A graphical diagnostic for identifying influential model choices in Bayesian hierarchical models. Scand. J. Stat. 38 529–550.
  • Spiegelhalter, D. J., Abrams, K. R. and Myles, J. P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley, New York.
  • Spiegelhalter, D. J., Dawid, A. P., Lauritzen, S. L. and Cowell, R. G. (1993). Bayesian analysis in expert systems. Statist. Sci. 8 219–283.
  • Spiegelhalter, D. J., Harris, N. L., Bull, K. and Franklin, R. C. G. (1994). Empirical evaluation of prior beliefs about frequencies: Methodology and a case study in congenital heart disease. J. Amer. Statist. Assoc. 89 435–443.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Steinbakk, G. H. andStorvik, G. O. (2009). Posterior predictive $p$-values in Bayesian hierarchical models. Scand. J. Stat. 36 320–336.
  • Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S. and Thompson, S. G. (2009). Bias modelling in evidence synthesis. J. Roy. Statist. Soc. Ser. A 172 21–47.
  • Welton, N. J., Ades, A. E., Carlin, J. B., Altman, D. G. and Sterne, J. A. C. (2009). Models for potentially based evidence in meta-analysis using empirically based priors. J. Roy. Statist. Soc. Ser. A 172 119–136.
  • Welton, N. J., Sutton, A. J., Cooper, N. J., Abrams, K. R. and Ades, A. E. (2012). Evidence Synthesis in a Decision Modelling Framework. In Evidence Synthesis for Decision Making in Healthcare 138–150. Wiley, New York.
  • White, I. R., Barrett, J. K., Jackson, D. and Higgins, J. P. T. (2012). Consistency and inconsistency in network meta-analysis: Model estimation using multivariate meta-regression. Research Synthesis Methods 3 111–125.