Electronic Journal of Statistics

Consistent model selection of discrete Bayesian networks from incomplete data

Nikolay Balov

Full-text: Open access


A maximum likelihood based model selection of discrete Bayesian networks is considered. The structure learning is performed by employing a scoring function $S$, which, for a given network $G$ and $n$-sample $D_{n}$, is defined as the maximum marginal log-likelihood $l$ minus a penalization term $\lambda_{n}h$ proportional to network complexity $h(G)$, $$S(G|D_{n})=l(G|D_{n})-\lambda_{n}h(G).$$ An available case analysis is developed with the standard log-likelihood replaced by the sum of sample average node log-likelihoods. The approach utilizes partially missing data records and allows for comparison of models fitted to different samples.

In missing completely at random settings the estimation is shown to be consistent if and only if the sequence $\lambda_{n}$ converges to zero at a slower than $n^{-{1/2}}$ rate. In particular, the BIC model selection ($\lambda_{n}=0.5\log(n)/n$) applied to the node-average log-likelihood is shown to be inconsistent in general. This is in contrast to the complete data case when BIC is known to be consistent. The conclusions are confirmed by numerical experiments.

Article information

Electron. J. Statist., Volume 7 (2013), 1047-1077.

First available in Project Euclid: 15 April 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F12: Asymptotic properties of estimators
Secondary: 62H12: Estimation

Bayesian networks categorical data model selection penalized maximum likelihood missing completely at random


Balov, Nikolay. Consistent model selection of discrete Bayesian networks from incomplete data. Electron. J. Statist. 7 (2013), 1047--1077. doi:10.1214/13-EJS802. https://projecteuclid.org/euclid.ejs/1366031050

Export citation


  • [1] Balov, N., Salzman, P. (2013). catnet: Categorical Bayesian Network Inference. R package version, 1.13.8.
  • [2] Beinlich, I., Suermondth, G., Chavez, R., Cooper, G. (1989)., The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In 2-nd European Conference on AI and Medicine.
  • [3] Buntine, W. (1996)., A guide to the literature on learning graphical models. IEEE Transactions on Knowledge and Data Engineering, 8:195-210
  • [4] Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507-554.
  • [5] Cooper, G., Herskovits, E. (1992)., A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309-347.
  • [6] Geiger, D., Heckerman, D., King, H. and Meek, C. (2001)., Stratified exponential families: Graphical models and model selection. The Annals of Statistics, 29(2):505-529.
  • [7] Haughton, Dominique M. A. (1988)., On the choice of a model to fit data from an exponential family. The Annals of Statistics, 16(1):342-355.
  • [8] Lauritzen, S.,L. (1995)., The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19(2):191-201.
  • [9] Pearl, J. (1988)., Probabilistic reasoning in intelligent systems: networks of plausible reasoning. Morgan Kaufmann, San Mateo, CA.
  • [10] Schwartz, G. (1978)., Estimating the dimension of a model. The Annals of Statistics, 6(2):461-464.
  • [11] Spiegelhalter, D., Dawid, A., Lauritzen, S., Cowell, R. (1993)., Bayesian analysis in expert systems. Statistical Science, 8(3):219-247.
  • [12] van der Vaart, A.W. (2007)., Asymptotic Statistics. Cambridge University Press.
  • [13] Verma, T. and Pearl, J. (1990)., Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, 255-268.