The Annals of Statistics

Approximating faces of marginal polytopes in discrete hierarchical models

Nanwei Wang, Johannes Rauh, and Hélène Massam

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


The existence of the maximum likelihood estimate in a hierarchical log-linear model is crucial to the reliability of inference for this model. Determining whether the estimate exists is equivalent to finding whether the sufficient statistics vector $t$ belongs to the boundary of the marginal polytope of the model. The dimension of the smallest face $\mathbf{F}_{t}$ containing $t$ determines the dimension of the reduced model which should be considered for correct inference. For higher-dimensional problems, it is not possible to compute $\mathbf{F}_{t}$ exactly. Massam and Wang (2015) found an outer approximation to $\mathbf{F}_{t}$ using a collection of submodels of the original model. This paper refines the methodology to find an outer approximation and devises a new methodology to find an inner approximation. The inner approximation is given not in terms of a face of the marginal polytope, but in terms of a subset of the vertices of $\mathbf{F}_{t}$.

Knowing $\mathbf{F}_{t}$ exactly indicates which cell probabilities have maximum likelihood estimates equal to $0$. When $\mathbf{F}_{t}$ cannot be obtained exactly, we can use, first, the outer approximation $\mathbf{F}_{2}$ to reduce the dimension of the problem and then the inner approximation $\mathbf{F}_{1}$ to obtain correct estimates of cell probabilities corresponding to elements of $\mathbf{F}_{1}$ and improve the estimates of the remaining probabilities corresponding to elements in $\mathbf{F}_{2}\setminus\mathbf{F}_{1}$. Using both real-world and simulated data, we illustrate our results, and show that our methodology scales to high dimensions.

Article information

Ann. Statist., Volume 47, Number 3 (2019), 1203-1233.

Received: March 2016
Revised: April 2018
First available in Project Euclid: 13 February 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation 62F10: Point estimation 62F99: None of the above, but in this section

Existence of the maximum likelihood estimate marginal polytope faces facial sets extended maximum likelihood estimate


Wang, Nanwei; Rauh, Johannes; Massam, Hélène. Approximating faces of marginal polytopes in discrete hierarchical models. Ann. Statist. 47 (2019), no. 3, 1203--1233. doi:10.1214/18-AOS1710.

Export citation


  • Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.
  • Csiszár, I. and Matúš, F. (2008). Generalized maximum likelihood estimates for exponential families. Probab. Theory Related Fields 141 213–246.
  • Csiszár, I. and Shields, P. (2004). Information Theory and Statistics: A Tutorial, 1st ed. Now Publishers, Hanover, MA.
  • Deza, M. M. and Laurent, M. (2010). Geometry of Cuts and Metrics. Algorithms and Combinatorics 15. Springer, Heidelberg.
  • Dobra, A., Erosheva, E. A. and Fienberg, S. E. (2004). Disclosure limitation methods based on bounds for large contingency tables with applications to disability. In Statistical Data Mining and Knowledge Discovery 93–116. Chapman & Hall, Boca Raton, FL.
  • Dobra, A. and Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5 969–993.
  • Eriksson, N., Fienberg, S. E., Rinaldo, A. and Sullivant, S. (2006). Polyhedral conditions for the nonexistence of the MLE for hierarchical log-linear models. J. Symbolic Comput. 41 222–233.
  • Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data, 2nd ed. MIT Press, Cambridge, MA.
  • Fienberg, S. E. and Rinaldo, A. (2007). Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation. J. Statist. Plann. Inference 137 3430–3445.
  • Fienberg, S. E. and Rinaldo, A. (2012). Maximum likelihood estimation in log-linear models. Ann. Statist. 40 996–1023.
  • Gawrilow, E. and Joswig, M. (2000). Polymake: A framework for analyzing convex polytopes. In Polytopes—Combinatorics and Computation (Oberwolfach, 1997). DMV Sem. 29 43–73. Birkhäuser, Basel.
  • Geyer, C. J. (2009). Likelihood inference in exponential families and directions of recession. Electron. J. Stat. 3 259–289.
  • Haberman, S. J. (1974). The Analysis of Frequency Data. The Univ. Chicago Press, Chicago, IL.
  • Karwa, V. and Slavković, A. (2016). Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs. Ann. Statist. 44 87–112.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
  • Letac, G. and Massam, H. (2012). Bayes factors and the geometry of discrete hierarchical loglinear models. Ann. Statist. 40 861–890.
  • Liu, Q. and Ihler, A. (2012). Distributed parameter estimation via pseudo-likelihood. Int. Conf. Mach. Learn. (ICML).
  • Massam, H. and Wang, N. (2015). A local approach to estimation in discrete loglinear models. Preprint. Available at arXiv:1504.05434.
  • Massam, H. and Wang, N. (2018). Local conditional and marginal approach to parameter estimation in discrete graphical models. J. Multivariate Anal. 164 1–21.
  • Rauh, J., Kahle, T. and Ay, N. (2011). Support sets in exponential families and oriented matroid theory. Internat. J. Approx. Reason. 52 613–626.
  • Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
  • Schmidt, M. (2005). minFunc: Unconstrained differentiable multivariate optimization in Matlab.
  • Vlach, M. (1986). Conditions for the existence of solutions of the three-dimensional planar transportation problem. Discrete Appl. Math. 13 61–78.
  • Wang, N., Rauh J. and Massam, H. (2019). Supplement to “Approximating faces of marginal polytopes in discrete hierarchical models.” DOI:10.1214/18-AOS1710SUPP.
  • Ziegler, G. M. (1995). Lectures on Polytopes. Graduate Texts in Mathematics 152. Springer, New York.

Supplemental materials

  • Supplement to “Approximating faces of marginal polytopes in discrete hierarchical models.”. Appendix A describes the concrete parametrization that we use in the examples. Appendix B discusses the case of two binary variables to illustrate what happens to the usual parameters when the MLE does not exist. Appendix C discusses how to further improve the parametrization $\mu_{L}$ introduced in Section 2. Appendices D and E give further results for the examples from Section 5. Appendix F gives the technical details for the example in Section 6.2.