Bernoulli

  • Bernoulli
  • Volume 18, Number 1 (2012), 290-321.

Tree cumulants and the geometry of binary tree models

Piotr Zwiernik and Jim Q. Smith

Full-text: Open access

Abstract

In this paper we investigate undirected discrete graphical tree models when all the variables in the system are binary, where leaves represent the observable variables and where all the inner nodes are unobserved. A novel approach based on the theory of partially ordered sets allows us to obtain a convenient parametrization of this model class. The construction of the proposed coordinate system mirrors the combinatorial definition of cumulants. A simple product-like form of the resulting parametrization gives insight into identifiability issues associated with this model class. In particular, we provide necessary and sufficient conditions for such a model to be identified up to the switching of labels of the inner nodes. When these conditions hold, we give explicit formulas for the parameters of the model. Whenever the model fails to be identified, we use the new parametrization to describe the geometry of the unidentified parameter space. We illustrate these results using a simple example.

Article information

Source
Bernoulli, Volume 18, Number 1 (2012), 290-321.

Dates
First available in Project Euclid: 20 January 2012

Permanent link to this document
https://projecteuclid.org/euclid.bj/1327068627

Digital Object Identifier
doi:10.3150/10-BEJ338

Mathematical Reviews number (MathSciNet)
MR2888708

Zentralblatt MATH identifier
1235.62004

Keywords
binary data central moments conditional independence cumulants general Markov models graphical models on trees hidden data identifiability Möbius function

Citation

Zwiernik, Piotr; Smith, Jim Q. Tree cumulants and the geometry of binary tree models. Bernoulli 18 (2012), no. 1, 290--321. doi:10.3150/10-BEJ338. https://projecteuclid.org/euclid.bj/1327068627


Export citation

References

  • [1] Allman, E.S., Matias, C. and Rhodes, J.A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.
  • [2] Auvray, V., Geurts, P. and Wehenkel, L. (2006). A semi-algebraic description of discrete naive Bayes models with two hidden classes. In Proc. Ninth International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, Florida. Available at http://anytime.cs.umass.edu/aimath06/.
  • [3] Balakrishnan, N., Johnson, N.L. and Kotz, S. (1998). A note on relationships between moments, central moments and cumulants from multivariate distributions. Statist. Probab. Lett. 39 49–54.
  • [4] Chang, J.T. (1996). Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency. Math. Biosci. 137 51–73.
  • [5] Feller, W. (1971). An Introduction to Probability Theory and Its Applications. Vol. II, 2nd ed. New York: Wiley.
  • [6] Geiger, D., Heckerman, D., King, H. and Meek, C. (2001). Stratified exponential families: Graphical models and model selection. Ann. Statist. 29 505–529.
  • [7] Lauritzen, S.L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford: Clarendon Press.
  • [8] McCullagh, P. (1987). Tensor Methods in Statistics. London: Chapman & Hall.
  • [9] Moulton, V. and Steel, M. (2004). Peeling phylogenetic ‘oranges’. Adv. in Appl. Math. 33 710–727.
  • [10] Pearl, J. and Tarsi, M. (1986). Structuring causal trees. J. Complexity 2 60–77.
  • [11] Rota, G.C. (1964). On the foundations of combinatorial theory. I. Theory of Möbius functions. Probab. Theory Related Fields 2 340–368.
  • [12] Rota, G.C. and Shen, J. (2000). On the combinatorics of cumulants. J. Combin. Theory Ser. A 91 283–304.
  • [13] Rusakov, D. and Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. J. Mach. Learn. Res. 6 1–35 (electronic).
  • [14] Semple, C. and Steel, M. (2003). Phylogenetics. Oxford Lecture Series in Mathematics and Its Applications 24. Oxford: Oxford Univ. Press.
  • [15] Settimi, R. and Smith, J.Q. (1998). On the geometry of Bayesian graphical models with hidden variables. In UAI (G.F. Cooper and M. Serafín, eds.) 472–479. San Francisco: Morgan Kaufmann.
  • [16] Settimi, R. and Smith, J.Q. (2000). Geometry, moments and conditional independence trees with hidden variables. Ann. Statist. 28 1179–1205.
  • [17] Speed, T.P. (1983). Cumulants and partition lattices. Austral. J. Statist. 25 378–388.
  • [18] Speicher, R. (1994). Multiplicative functions on the lattice of noncrossing partitions and free convolution. Math. Ann. 298 611–628.
  • [19] Spiegelhalter, D.J., Dawid, A.P., Lauritzen, S.L. and Cowell, R.G. (1993). Bayesian analysis in expert systems. Statist. Sci. 8 219–283. With comments and a rejoinder by the authors.
  • [20] Stanley, R.P. (2002). Enumerative Combinatorics. Volume I. Cambridge Studies in Advanced Mathematics 49. Cambridge: Cambridge Univ. Press.