## The Annals of Statistics

### High-dimensional structure estimation in Ising models: Local separation criterion

#### Abstract

We consider the problem of high-dimensional Ising (graphical) model selection. We propose a simple algorithm for structure estimation based on the thresholding of the empirical conditional variation distances. We introduce a novel criterion for tractable graph families, where this method is efficient, based on the presence of sparse local separators between node pairs in the underlying graph. For such graphs, the proposed algorithm has a sample complexity of $n=\Omega(J_{\min}^{-2}\log p)$, where $p$ is the number of variables, and $J_{\min}$ is the minimum (absolute) edge potential in the model. We also establish nonasymptotic necessary and sufficient conditions for structure estimation.

#### Article information

Source
Ann. Statist., Volume 40, Number 3 (2012), 1346-1375.

Dates
First available in Project Euclid: 10 August 2012

https://projecteuclid.org/euclid.aos/1344610586

Digital Object Identifier
doi:10.1214/12-AOS1009

Mathematical Reviews number (MathSciNet)
MR3015028

Zentralblatt MATH identifier
1297.62124

Subjects
Primary: 62H12: Estimation

#### Citation

Anandkumar, Animashree; Tan, Vincent Y. F.; Huang, Furong; Willsky, Alan S. High-dimensional structure estimation in Ising models: Local separation criterion. Ann. Statist. 40 (2012), no. 3, 1346--1375. doi:10.1214/12-AOS1009. https://projecteuclid.org/euclid.aos/1344610586

#### References

• [1] Abbeel, P., Koller, D. and Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. J. Mach. Learn. Res. 7 1743–1788.
• [2] Albert, R. and Barabási, A.-L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74 47–97.
• [3] Anandkumar, A., Tan, V. Y. F., Huang, F. and Willsky, A. S. (2011). High-dimensional Gaussian graphical model selection: Tractable graph families. Preprint. Available at arXiv:1107.1270.
• [4] Anandkumar, A., Tan, V. Y. F., Huang, F. and Willsky, A. S. (2012). Supplement to “High-dimensional structure learning of Ising models: Local separation criterion.” DOI:10.1214/12-AOS1009SUPP.
• [5] Bayati, M., Montanari, A. and Saberi, A. (2009). Generating random graphs with large girth. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms 566–575. SIAM, Philadelphia, PA.
• [6] Bento, J. and Montanari, A. (2009). Which graphical models are difficult to learn? In Proc. of Neural Information Processing Systems (NIPS).
• [7] Bogdanov, A., Mossel, E. and Vadhan, S. (2008). The complexity of distinguishing Markov random fields. In Approximation, Randomization and Combinatorial Optimization. Lecture Notes in Comput. Sci. 5171 331–342. Springer, Berlin.
• [8] Bollobás, B. (1985). Random Graphs. Academic Press, London.
• [9] Brémaud, P. (1999). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Texts in Applied Mathematics 31. Springer, New York.
• [10] Bresler, G., Mossel, E. and Sly, A. (2008). Reconstruction of Markov random fields from samples: Some observations and algorithms. In Approximation, Randomization and Combinatorial Optimization. Lecture Notes in Computer Science 5171 343–356. Springer, Berlin.
• [11] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2010). Latent variable graphical model selection via convex optimization. Ann. Statist. To appear. Preprint. Available on ArXiv.
• [12] Chechetka, A. and Guestrin, C. (2007). Efficient principled learning of thin junction trees. In Advances in Neural Information Processing Systems (NIPS).
• [13] Cheng, J., Greiner, R., Kelly, J., Bell, D. and Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence 137 43–90.
• [14] Choi, M. J., Lim, J. J., Torralba, A. and Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
• [15] Choi, M. J., Tan, V. Y. F., Anandkumar, A. and Willsky, A. S. (2011). Learning latent tree graphical models. J. Mach. Learn. Res. 12 1771–1812.
• [16] Chow, C. and Liu, C. (1968). Approximating Discrete Probability Distributions with Dependence Trees. IEEE Tran. on Information Theory 14 462–467.
• [17] Chung, F. R. K. (1997). Spectral Graph Theory. CBMS Regional Conference Series in Mathematics 92. Published for the Conference Board of the Mathematical Sciences, Washington, DC.
• [18] Chung, F. R. K. and Lu, L. (2006). Complex Graphs and Network. Amer. Math. Soc., Providence, RI.
• [19] Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory, 2nd ed. Wiley, Hoboken, NJ.
• [20] Dommers, S., Giardinà, C. and van der Hofstad, R. (2010). Ising models on power-law random graphs. J. Stat. Phys. 141 1–23.
• [21] Durbin, R., Eddy, S. R., Krogh, A. and Mitchison, G. (1999). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, Cambridge.
• [22] Eppstein, D. (2000). Diameter and treewidth in minor-closed graph families. Algorithmica 27 275–291.
• [23] Galam, S. (1997). Rational group decision making: A random field Ising model at $\mathrmT=0$. Physica A: Statistical and Theoretical Physics 238 66–80.
• [24] Gamburd, A., Hoory, S., Shahshahani, M., Shalev, A. and Virág, B. (2009). On the girth of random Cayley graphs. Random Structures Algorithms 35 100–117.
• [25] Grabowski, A. and Kosinski, R. (2006). Ising-based model of opinion formation in a complex network of interpersonal interactions. Physica A: Statistical Mechanics and Its Applications 361 651–664.
• [26] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
• [27] Karger, D. and Srebro, N. (2001). Learning Markov networks: Maximum bounded tree-width graphs. In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (Washington, DC, 2001) 392–401. SIAM, Philadelphia, PA.
• [28] Kearns, M. J. and Vazirani, U. V. (1994). An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA.
• [29] Kloks, T. (1994). Only few graphs have bounded treewidth. Springer Lecture Notes in Computer Science 842 51–60.
• [30] Laciana, C. E. and Rovere, S. L. (2010). Ising-like agent-based technology diffusion model: Adoption patterns vs. seeding strategies. Physica A: Statistical Mechanics and Its Applications 390 1139–1149.
• [31] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
• [32] Levin, D. A., Peres, Y. and Wilmer, E. L. (2008). Markov Chains and Mixing Times. Amer. Math. Soc., Providence, RI.
• [33] Liu, H., Xu, M., Gu, H., Gupta, A., Lafferty, J. and Wasserman, L. (2011). Forest density estimation. J. Mach. Learn. Res. 12 907–951.
• [34] Liu, S., Ying, L. and Shakkottai, S. (2010). Influence maximization in social networks: An ising-model-based approach. In Proc. 48th Annual Allerton Conference on Communication, Control, and Computing.
• [35] Lovász, L., Neumann Lara, V. and Plummer, M. (1978). Mengerian theorems for paths of bounded length. Period. Math. Hungar. 9 269–276.
• [36] McKay, B. D., Wormald, N. C. and Wysocka, B. (2004). Short cycles in random regular graphs. Electron. J. Combin. 11 Research Paper 66, 12 pp. (electronic).
• [37] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [38] Mitliagkas, I. and Vishwanath, S. (2010). Strong information-theoretic limits for source/model recovery. In Proc. 48th Annual Allerton Conference on Communication, Control and Computing.
• [39] Netrapalli, P., Banerjee, S., Sanghavi, S. and Shakkottai, S. (2010). Greedy learning of Markov network structure. In Proc. 48th Annual Allerton Conference on Communication, Control and Computing.
• [40] Newman, M. E. J., Watts, D. J. and Strogatz, S. H. (2002). Random graph models of social networks. Proc. Natl. Acad. Sci. USA 99 2566–2572.
• [41] Ravikumar, P., Wainwright, M. J. and Lafferty, J. (2010). High-dimensional Ising model selection using $\ell_1$-regularized logistic regression. Ann. Statist. 38 1287–1319.
• [42] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
• [43] Santhanam, N. P. and Wainwright, M. J. (2008). Information-theoretic limits of high-dimensional model selection. In International Symposium on Information Theory.
• [44] Spirtes, P. and Meek, C. (1995). Learning Bayesian networks with discrete variables from data. In Proc. of Intl. Conf. on Knowledge Discovery and Data Mining 294–299.
• [45] Tan, V. Y. F., Anandkumar, A., Tong, L. and Willsky, A. S. (2011). A large-deviation analysis of the maximum-likelihood learning of Markov tree structures. IEEE Trans. Inform. Theory 57 1714–1735.
• [46] Tan, V. Y. F., Anandkumar, A. and Willsky, A. S. (2010). Learning Gaussian tree models: Analysis of error exponents and extremal structures. IEEE Trans. Signal Process. 58 2701–2714.
• [47] Tan, V. Y. F., Anandkumar, A. and Willsky, A. S. (2011). Learning high-dimensional Markov forest distributions: Analysis of error rates. J. Mach. Learn. Res. 12 1617–1653.
• [48] Vega-Redondo, F. (2007). Complex Social Networks. Econometric Society Monographs 44. Cambridge Univ. Press, Cambridge.
• [49] Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1 1–305.
• [50] Wang, W., Wainwright, M. J. and Ramchandran, K. (2010). Information-theoretic bounds on model selection for Gaussian Markov random fields. In IEEE International Symposium on Information Theory Proceedings (ISIT).
• [51] Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature 393 440–442.
• [52] Graphical Model of Senate Voting. http://www.eecs.berkeley.edu/~elghaoui/StatNews/ex_senate.html.