The Annals of Statistics

Stratified exponential families: Graphical models and model selection

Dan Geiger, David Heckerman, Henry King, and Christopher Meek

Full-text: Open access


We describe a hierarchy of exponential families which is useful for distinguishing types of graphical models. Undirected graphical models with no hidden variables are linear exponential families (LEFs). Directed acyclic graphical (DAG) models and chain graphs with no hidden variables, includ­ ing DAG models with several families of local distributions, are curved exponential families (CEFs). Graphical models with hidden variables are what we term stratified exponential families (SEFs). A SEF is a finite union of CEFs of various dimensions satisfying some regularity conditions. We also show that this hierarchy of exponential families is noncollapsing with respect to graphical models by providing a graphical model which is a CEF but not a LEF and a graphical model that is a SEF but not a CEF. Finally, we show how to compute the dimension of a stratified exponential family. These results are discussed in the context of model selection of graphical models.

Article information

Ann. Statist., Volume 29, Number 2 (2001), 505-529.

First available in Project Euclid: 24 December 2001

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60E05: Distributions: general theory 62H05: Characterization and structure theory

Bayesian networks graphical models hidden variables curved exponential families stratified exponential families semialgebraic sets model selection


Geiger, Dan; Heckerman, David; King, Henry; Meek, Christopher. Stratified exponential families: Graphical models and model selection. Ann. Statist. 29 (2001), no. 2, 505--529. doi:10.1214/aos/1009210550.

Export citation


  • Abramson, B., Brown, J., Edwards, W., Murphy, A. and Winkler, R. (1996). Hailfinder: a Bayesian system for forecastingsevere weather. Internat. J. Forecasting 12 57-71.
  • Akbulut, S. and King, H. (1992). Topology of Real Algebraic Sets. Springer, New York.
  • Andersson, S., Madigan, D. and Perlman, M. (1996). An alternative Markov property for chain graphs. Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence 40-48. Morgan Kaufmann, San Francisco.
  • Bamber, D. and van Santen, J. (1985). How many parameters can a model have and still be testable? J. Math. Psych. 29 443-473.
  • Barndorff-Nielsen, O. (1978). Information and Exponential Families. Wiley, New York.
  • Benedetti, R. and Risler, J. (1990). Real Algebraic and Semialgebraic Sets. Hermann, Paris.
  • Berzuini, C., Bellazzi, R., Quaglini, S. and Speigelhalter, D. (1992). Bayesian networks for patient monitoring. Artificial Intelligence in Medicine 4 243-260.
  • Br ¨ocker, Th. and J¨anich, K. (1982). Introduction to Differential Topology. Cambridge Univ. Press.
  • Chickering, D., Heckerman, D. and Meek, C. (1997). A Bayesian approach to learningBayesian networks with local structure. In Proceedings of Uncertainty and Artificial Intelligence 80-89. Morgan Kaufmann, San Francisco.
  • Cowell, R., Dawid, A. P., Lauritzen, S. and Spiegelhalter, D. (1999). Probabilistic Networks and Expert Systems (Statistics for Engineering and Information Science). Springer, New York.
  • Efron, B. (1978). The geometry of exponential families. Ann. Statist. 6 362-376.
  • Eizirik, L., Barbosa, V. and Mendes, S. (1993). A Bayesian-network approach to lexical disambiguation. Cognitive Science 17 257-283.
  • Fraley, C. and Raftery, A. (1998). How many clusters? Which clusteringmethod? Answers via model-based cluster analysis. Computer Journal 41 578-588.
  • Frey, B. ed. (1978). Graphical Models for Machine Learning and Digital Communication. MIT Press.
  • Friedman, N. and Goldszmidt, M. (1996). LearningBayesian networks with local structure. In Poceedings of Twelfth Conference on Uncertainty in Artificial Intelligence 252-262. Morgan Kaufmann, San Francisco.
  • Fung, B. and Favero, B. D. (1995). ApplyingBayesian networks to information retrieval. Comm. ACM 38 42-48.
  • Gavard, L., Bhadeshia, H., MacKay, D. and Suzuki, S. (1996). Bayesian neural network model for austenite formation in steels. Materials Science and Technology 12 453-463.
  • Geiger, D. and Heckerman, D. (1994). LearningGaussian networks. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence 235-243. Morgan Kaufmann, San Francisco.
  • Geiger, D., Heckerman, D. and Meek, C. (1996). Asymptotic model selection for directed networks with hidden variables. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence 283-290. Morgan Kaufmann, San Francisco.
  • Geiger, D. and Meek, C. (1998). Graphical models and exponential families. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence 156-165. Morgan Kaufmann, San Francisco.
  • Goodman, L. (1974). Exploratory latent structure analysis usingboth identifiable and unidentifiable models. Biometrika 61 215-231.
  • Harris, N. (1990). Probabilistic belief networks for genetic counseling. Computer Methods and Programs in Biomedicine 32 37-44.
  • Haughton, D. (1988). On the choice of a model to fit data from an exponential family. Ann. Statist. 16 342-555.
  • Heckerman, D. and Breese, J. (1996). Causal independence for probability assessment and inference usingBayesian networks. IEEE Systems, Man, and Cybernetics 26 826-831.
  • Heckerman, D., Breese, J. and Rommelse, K. (1995). Decision-theoretic troubleshooting. Comm. ACM 38 49-57.
  • Henrion, M. (1987). Some practical issues in constructingbelief networks. In Proceedings of the Third Workshop on Uncertainty in Artificial Intelligence 132-139. Association for Uncertainty in Artificial Intelligence, Mountain View, CA.
  • Kass, R. and Vos, P. (1997). Geometrical Foundations of Asymptotic Inference. Wiley, New York.
  • Koster, J. (1997). Gibbs and Markov properties of graphs. Ann. Math. Artificial Intelligence 21 13-26.
  • Kumar, V. and Desai, U. (1996). Image interpretation using Bayesian networks. IEEE Trans. Pattern Analysis and Machine Intelligence 18 74-77.
  • Lauritzen, S. (1996). Graphical Models. Claredon Press, Oxford.
  • Lauritzen, S. and Wermuth, N. (1989). Graphical models for association between variables, some of which are qualitative and some quantitative. Ann. Statist. 17 31-57.
  • McEliece, R., MacKay, D., and Cheng, J. (1998). Trubo decodingas an instance of Pearl's belief propagation algorithm. IEEE Journal on Selected Areas in Communication 16 140-152.
  • Meek, C. and Heckerman, D. (1997). Structure and parameter learningfor causal independence and causal interaction models. In Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence 366-375. Morgan Kaufmann, San Francisco. Olesen, K., Kjaerulff, U., Jensen, F., Jensen, F., Flack, B., Andreassen, S. and Andersen, S.
  • (1989). A MUNIN network for the median nerve: A case study on loops. Applied Artificial Intelligence 3 385-404.
  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco.
  • Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge Univ. Press.
  • Sarkar, S. and Boyer, K. (1993). Integration, inference, and management of spatial information usingBayesian networks: Perceptual organization. IEEE Trans. Pattern Analysis and Machine Intelligence 15 256-274.
  • Schwarz, G. (1978). Estimatingthe dimension of a model. Ann. Statist. 6 461-464.
  • Settimi, R. and Smith, J. (1998). On the geometry of Bayesian graphical models with hidden variables. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence 472-479. Morgan Kaufmann, San Francisco.
  • Shachter, R. and Kenley, R. (1986). Gaussian influence diagrams. Management Science 35 527-550. Shwe, M., Middleton, B., Heckerman, D., Henrion, M., Horvitz, E., Lehmann, H. and
  • Cooper, G. (1991). Probabilistic diagnosis using a reformulation of the INTERNIST1/QMR knowledge base I. The probabilistic model and inference algorithms. Methods in Information and Medicine 30 241-250.
  • Spiegelhalter, D. and Thomas, A. (1998). Graphical modelingfor complex stochastic systems: The BUGS project. IEEE Intelligent Systems and Their Applications 13 14-15.
  • Spirtes, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction, and Search. Springer, New York.
  • Spirtes, P., Richardson, T. and Meek, C. (1997). The dimensionality of mixed ancestral graphs. Technical Report CMU-PHIL-83, Dept. Philosophy, Carnegie Mellon Univ.
  • Spivak, M. (1965). Calculus on Manifolds. Addison-Wesley, New York.
  • Turtle, H. and Croft, B. (1991). Evaluation of an inference network-based retrieval model. ACM Trans. Information Systems 9 1878-222.
  • Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, New York. Department of Computer Science Technion-Israel Institute of Technology Haifa 32000 Israel E-mail: