## The Annals of Statistics

### Maximum likelihood estimation in Gaussian models under total positivity

#### Abstract

We analyze the problem of maximum likelihood estimation for Gaussian distributions that are multivariate totally positive of order two ($\mathrm{MTP}_{2}$). By exploiting connections to phylogenetics and single-linkage clustering, we give a simple proof that the maximum likelihood estimator (MLE) for such distributions exists based on $n\geq2$ observations, irrespective of the underlying dimension. Slawski and Hein [Linear Algebra Appl. 473 (2015) 145–179], who first proved this result, also provided empirical evidence showing that the $\mathrm{MTP}_{2}$ constraint serves as an implicit regularizer and leads to sparsity in the estimated inverse covariance matrix, determining what we name the ML graph. We show that we can find an upper bound for the ML graph by adding edges corresponding to correlations in excess of those explained by the maximum weight spanning forest of the correlation matrix. Moreover, we provide globally convergent coordinate descent algorithms for calculating the MLE under the $\mathrm{MTP}_{2}$ constraint which are structurally similar to iterative proportional scaling. We conclude the paper with a discussion of signed $\mathrm{MTP}_{2}$ distributions.

#### Article information

Source
Ann. Statist., Volume 47, Number 4 (2019), 1835-1863.

Dates
Revised: November 2017
First available in Project Euclid: 21 May 2019

https://projecteuclid.org/euclid.aos/1558425632

Digital Object Identifier
doi:10.1214/17-AOS1668

Mathematical Reviews number (MathSciNet)
MR3953437

Zentralblatt MATH identifier
07082272

#### Citation

Lauritzen, Steffen; Uhler, Caroline; Zwiernik, Piotr. Maximum likelihood estimation in Gaussian models under total positivity. Ann. Statist. 47 (2019), no. 4, 1835--1863. doi:10.1214/17-AOS1668. https://projecteuclid.org/euclid.aos/1558425632

#### References

• [1] Anandkumar, A., Tan, V. Y. F., Huang, F. and Willsky, A. S. (2012). High-dimensional Gaussian graphical model selection: Walk summability and local separation criterion. J. Mach. Learn. Res. 13 2293–2337.
• [2] Bartolucci, F. and Besag, J. (2002). A recursive algorithm for Markov random fields. Biometrika 89 724–730.
• [3] Bartolucci, F. and Forcina, A. (2000). A likelihood ratio test for ${\mathrm{MTP}_{2}}$ within binary variables. Ann. Statist. 28 1206–1218.
• [4] Bhattacharya, B. (2012). Covariance selection and multivariate dependence. J. Multivariate Anal. 106 212–228.
• [5] Bølviken, E. (1982). Probability inequalities for the multivariate normal with nonnegative partial correlations. Scand. J. Stat. 9 49–58.
• [6] Buhl, S. L. (1993). On the existence of maximum likelihood estimators for graphical Gaussian models. Scand. J. Stat. 20 263–270.
• [7] Choi, M. J., Tan, V. Y. F., Anandkumar, A. and Willsky, A. S. (2011). Learning latent tree graphical models. J. Mach. Learn. Res. 12 1771–1812.
• [8] Colangelo, A., Scarsini, M. and Shaked, M. (2005). Some notions of multivariate positive dependence. Insurance Math. Econom. 37 13–26.
• [9] Dellacherie, C., Martinez, S. and San Martin, J. (2014). Inverse M-Matrices and Ultrametric Matrices 2118. Springer, Berlin.
• [10] Dempster, A. P. (1972). Covariance selection. Biometrics 28 157–175.
• [11] Djolonga, J. and Krause, A. (2015). Scalable variational inference in log-supermodular models. In In International Conference on Machine Learning (ICML).
• [12] Egilmez, H. E., Pavez, E. and Ortega, A. (2016). Graph learning from data under structural and Laplacian constraints. Available at arXiv:1611.0518.
• [13] Fallat, S., Lauritzen, S. L., Sadeghi, K., Uhler, C., Wermuth, N. and Zwiernik, P. (2017). Total positivity in Markov structures. Ann. Statist. 45 1152–1184.
• [14] Felsenstein, J. (1973). Maximum-likelihood estimation of evolutionary trees from continuous characters. Am. J. Hum. Genet. 25 471–492.
• [15] Fortuin, C. M., Kasteleyn, P. W. and Ginibre, J. (1971). Correlation inequalities on some partially ordered sets. Comm. Math. Phys. 22 89–103.
• [16] Gower, J. C. and Ross, G. J. S. (1969). Minimum spanning trees and single linkage cluster analysis. Appl. Statist. 18 54–61.
• [17] Grant, M. and Boyd, S. (2014). CVX: Matlab software for disciplined convex programming, version 2.1. Available at http://cvxr.com/cvx.
• [18] Gross, E. and Sullivant, S. (2018). The maximum likelihood threshold of a graph. Bernoulli 24 386–407.
• [19] Højsgaard, S., Edwards, D. and Lauritzen, S. (2012). Graphical Models with R. Springer, New York.
• [20] Johnson, C. R. and Smith, R. L. (1996). The completion problem for $M$-matrices and inverse $M$-matrices. Linear Algebra Appl. 241–243 655–667.
• [21] Johnson, C. R. and Smith, R. L. (1999). Path product matrices. Linear and Multilinear Algebra 46 177–191.
• [22] Johnson, C. R. and Smith, R. L. (2011). Inverse $M$-matrices, II. Linear Algebra Appl. 435 953–983.
• [23] Karlin, S. and Rinott, Y. (1980). Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions. J. Multivariate Anal. 10 467–498.
• [24] Karlin, S. and Rinott, Y. (1981). Total positivity properties of absolute value multinormal variables with applications to confidence interval estimates and related probabilistic inequalities. Ann. Statist. 9 1035–1049.
• [25] Karlin, S. and Rinott, Y. (1983). M-matrices as covariance matrices of multinormal distributions. Linear Algebra Appl. 52 419–438.
• [26] Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford.
• [27] Ledermann, W. (1940). On a problem concerning matrices with variable diagonal elements. Proc. Roy. Soc. Edinburgh Sect. A 60 1–17.
• [28] Luo, Z. Q. and Tseng, P. (1992). On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72 7–35.
• [29] Malioutov, D. M., Johnson, J. K. and Willsky, A. S. (2006). Walk-sums and belief propagation in Gaussian graphical models. J. Mach. Learn. Res. 7 2031–2064.
• [30] Malle, B. F. and Horowitz, L. M. (1995). The puzzle of negative self-views: An exploration using the schema concept. J. Pers. Soc. Psychol. 68 470.
• [31] Newman, C. M. (1983). A general central limit theorem for FKG systems. Comm. Math. Phys. 91 75–80.
• [32] Ostrowski, A. (1937). Über die Determinanten mit überwiegender Hauptdiagonale. Comment. Math. Helv. 10 69–96.
• [33] Propp, J. G. and Wilson, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. In Proceedings of the Seventh International Conference on Random Structures and Algorithms (Atlanta, GA, 1995) 9 223–252.
• [34] Semple, C. and Steel, M. A. (2003). Phylogenetics 24. Oxford Univ. Press, London.
• [35] Shapiro, A. (1988). Towards a unified theory of inequality constrained testing in multivariate analysis. Int. Stat. Rev. 56 49–62.
• [36] Shiers, N., Zwiernik, P., Aston, J. and Smith, J. Q. (2016). The correlation space of Gaussian latent tree models and model selection without fitting. Biometrika 103 531–545.
• [37] Slawski, M. and Hein, M. (2015). Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov random fields. Linear Algebra Appl. 473 145–179.
• [38] Spearman, C. (1928). The abilities of man. Science 68 38.
• [39] Speed, T. P. and Kiiveri, H. T. (1986). Gaussian Markov distributions over finite graphs. Ann. Statist. 14 138–150.
• [40] Uhler, C. (2012). Geometry of maximum likelihood estimation in Gaussian graphical models. Ann. Statist. 40 238–261.
• [41] Wermuth, N. and Scheidt, E. (1977). Algorithm AS 105: Fitting a covariance selection model to a matrix. J. R. Stat. Soc. Ser. C. Appl. Stat. 26 88–92.
• [42] Zwiernik, P. (2016). Semialgebraic Statistics and Latent Tree Models. Monographs on Statistics and Applied Probability 146. Chapman & Hall/CRC, Boca Raton, FL.