## The Annals of Statistics

### Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs

#### Abstract

Graphical models are popular statistical tools which are used to represent dependent or causal complex systems. Statistically equivalent causal or directed graphical models are said to belong to a Markov equivalent class. It is of great interest to describe and understand the space of such classes. However, with currently known algorithms, sampling over such classes is only feasible for graphs with fewer than approximately 20 vertices. In this paper, we design reversible irreducible Markov chains on the space of Markov equivalent classes by proposing a perfect set of operators that determine the transitions of the Markov chain. The stationary distribution of a proposed Markov chain has a closed form and can be computed easily. Specifically, we construct a concrete perfect set of operators on sparse Markov equivalence classes by introducing appropriate conditions on each possible operator. Algorithms and their accelerated versions are provided to efficiently generate Markov chains and to explore properties of Markov equivalence classes of sparse directed acyclic graphs (DAGs) with thousands of vertices. We find experimentally that in most Markov equivalence classes of sparse DAGs, (1) most edges are directed, (2) most undirected subgraphs are small and (3) the number of these undirected subgraphs grows approximately linearly with the number of vertices.

#### Article information

Source
Ann. Statist., Volume 41, Number 4 (2013), 1742-1779.

Dates
First available in Project Euclid: 5 September 2013

https://projecteuclid.org/euclid.aos/1378386238

Digital Object Identifier
doi:10.1214/13-AOS1125

Mathematical Reviews number (MathSciNet)
MR3127848

Zentralblatt MATH identifier
1360.62369

#### Citation

He, Yangbo; Jia, Jinzhu; Yu, Bin. Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs. Ann. Statist. 41 (2013), no. 4, 1742--1779. doi:10.1214/13-AOS1125. https://projecteuclid.org/euclid.aos/1378386238

#### References

• [1] Aldous, D. and Fill, J. Reversible Markov chains and random walks on graphs. Available at http://www.stat.berkeley.edu/~aldous/RWG/book.html.
• [2] Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505–541.
• [3] Castelo, R. and Perlman, M. D. (2004). Learning essential graph Markov models from data. In Advances in Bayesian Networks. Studies in Fuzziness and Soft Computing 146 255–269. Springer, Berlin.
• [4] Chickering, D., Geiger, D. and Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. In Proceedings of Fifth Conference on Artificial Intelligence and Statistics 112–128. Ft. Lauerdale, Society for Artificial Intelligence in Statistics, FL.
• [5] Chickering, D. M. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (Montreal, PQ, 1995) 87–98. Morgan Kaufmann, San Francisco, CA.
• [6] Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445–498.
• [7] Chickering, D. M. (2003). Optimal structure identification with greedy search: Computational learning theory. J. Mach. Learn. Res. 3 507–554.
• [8] Cooper, G. F. and Yoo, C. (1999). Causal discovery from a mixture of experimental and observational data. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence 116–125. Morgan Kaufmann, San Francisco, CA.
• [9] Dash, D. and Druzdzel, M. J. (1999). A hybrid anytime algorithm for the construction of causal models from sparse data. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence 142–149. Morgan Kaufmann, San Mateo, CA.
• [10] Dor, D. and Tarsi, M. (1992). A simple algorithm to construct a consistent extension of a partially oriented graph. Technicial Report R-185, Cognitive Systems Laboratory, UCLA.
• [11] Eberhardt, F. and Scheines, R. (2007). Interventions and causal inference. Philos. Sci. 74 981–995.
• [12] Finegold, M. and Drton, M. (2011). Robust graphical modeling of gene networks using classical and alternative $t$-distributions. Ann. Appl. Stat. 5 1057–1080.
• [13] Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science Signaling 303 799.
• [14] Gillispie, S. B. (2006). Formulas for counting acyclic digraph Markov equivalence classes. J. Statist. Plann. Inference 136 1410–1432.
• [15] Gillispie, S. B. and Perlman, M. D. (2002). The size distribution for Markov equivalence classes of acyclic digraph models. Artificial Intelligence 141 137–155.
• [16] He, Y., Jia, J. and Yu, B. (2013). Supplement to “Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs.” DOI:10.1214/13-AOS1125SUPP.
• [17] He, Y.-B. and Geng, Z. (2008). Active learning of causal networks with intervention experiments and optimal designs. J. Mach. Learn. Res. 9 2523–2547.
• [18] Heckerman, D., Geiger, D. and Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20 197–243.
• [19] Heckerman, D., Meek, C. and Cooper, G. (1999). A Bayesian approach to causal discovery. In Computation, Causation, and Discovery 141–165. AAAI Press, Menlo Park, CA.
• [20] Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S., Emili, A., Snyder, M., Greenblatt, J. F. and Gerstein, M. (2003). A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302 449.
• [21] Kalisch, M. and Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8 613–636.
• [22] Lauritzen, S. L. and Richardson, T. S. (2002). Chain graph models and their causal interpretations. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 321–361.
• [23] Lovasz, L. (1993). Random walks on graphs: A survey. Combinatorics: Paul Erdős Is Eighty 2 1–46.
• [24] Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133–3164.
• [25] Madigan, D., Andersson, S. A., Perlman, M. D. and Volinsky, C. T. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Comm. Statist. Theory Methods 25 2493–2519.
• [26] Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (San Mateo) 403–410. Morgan Kaufmann, San Francisco, CA.
• [27] Munteanu, P. and Bendou, M. (2001). The eq framework for learning equivalence classes of Bayesian networks. In Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001 417–424. IEEE, San Jose, CA.
• [28] Norris, J. R. (1997). Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics 2. Cambridge Univ. Press, Cambridge.
• [29] Peña, J. M. (2007). Approximate counting of graphical models via MCMC. In Proceedings of the 11th International Conference on Artificial Intelligence 352–359. San Juan, Puerto Rico; available at http://jmlr.org/proceedings/papers/v2/pena07a/pena07a.pdf.
• [30] Peña, J. M. (2013). Approximate counting of graphical models via mcmc revisited. Preprint. Available at arXiv:1301.7189.
• [31] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA.
• [32] Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge Univ. Press, Cambridge.
• [33] Pearl, J. and Verma, T. S. (1991). A theory of inferred causation. In Principles of Knowledge Representation and Reasoning (Cambridge, MA, 1991) 441–452. Morgan Kaufmann, San Mateo, CA.
• [34] Perlman, M. D. (2001). Graphical model search via essential graphs. In Algebraic Methods in Statistics and Probability (Notre Dame, IN, 2000). Contemporary Mathematics 287 255–265. Amer. Math. Soc., Providence, RI.
• [35] Spirtes, P., Glymour, C. N. and Scheines, R. (2001). Causation, Prediction, and Search. MIT Press, Cambridge.
• [36] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence 270. Elsevier, Amsterdam.
• [37] Verma, T. and Pearl, J. (1992). An algorithm for deciding if a set of observed independencies has a causal explanation. In Proceedings of the Eighth International Conference on Uncertainty in Artificial Intelligence 323–330. Morgan Kaufmann, San Mateo, CA.

#### Supplemental materials

• Supplementary material: Supplement to “Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs”. In this supplementary note, we give some algorithms, examples, an experiment and the proofs of the results in this paper.