The Annals of Statistics

High-dimensional Ising model selection using 1-regularized logistic regression

Pradeep Ravikumar, Martin J. Wainwright, and John D. Lafferty

Full-text: Open access


We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on 1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an 1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

Article information

Ann. Statist., Volume 38, Number 3 (2010), 1287-1319.

First available in Project Euclid: 8 March 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F12: Asymptotic properties of estimators
Secondary: 68T99: None of the above, but in this section

Graphical models Markov random fields structure learning ℓ_1-regularization model selection convex risk minimization high-dimensional asymptotics


Ravikumar, Pradeep; Wainwright, Martin J.; Lafferty, John D. High-dimensional Ising model selection using ℓ 1 -regularized logistic regression. Ann. Statist. 38 (2010), no. 3, 1287--1319. doi:10.1214/09-AOS691.

Export citation


  • [1] Abbeel, P., Koller, D. and Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. J. Mach. Learn. Res. 7 1743–1788.
  • [2] Banerjee, O., Ghaoui, L. E. and d’Asprémont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • [3] Bertsekas, D. (1995). Nonlinear Programming. Athena Scientific, Belmont, MA.
  • [4] Bresler, G., Mossel, E. and Sly, A. (2009). Reconstruction of Markov random fields from samples: Some easy observations and algorithms. Available at
  • [5] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.
  • [6] Chickering, D. (1995). Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V (D. Fisher and H. Lenz, eds.). Lecture Notes in Statistics 112 121–130. Springer, New York.
  • [7] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 462–467.
  • [8] Cross, G. and Jain, A. (1983). Markov random field texture models. IEEE Trans. PAMI 5 25–39.
  • [9] Csiszár, I. and Talata, Z. (2006). Consistent estimation of the basic neighborhood structure of Markov random fields. Ann. Statist. 34 123–145.
  • [10] Dasgupta, S. (1999). Learning polytrees. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99). Morgan Kaufmann, San Francisco, CA.
  • [11] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of the Geometry of Banach Spaces 1 317–336. Elsevier, Amsterdam.
  • [12] Donoho, D. and Elad, M. (2003). Maximal sparsity representation via 1 minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.
  • [13] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6 721–741.
  • [14] Hassner, M. and Sklansky, J. (1980). The use of Markov random fields as models of texture. Comp. Graphics Image Proc. 12 357–370.
  • [15] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
  • [16] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press, Cambridge.
  • [17] Ising, E. (1925). Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31 253–258.
  • [18] Kalisch, M. and Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8 613–636.
  • [19] Kim, Y., Kim, J. and Kim, Y. (2005). Blockwise sparse regression. Statist. Sinica 16 375–390.
  • [20] Koh, K., Kim, S. J. and Boyd, S. (2007). An interior-point method for large-scale 1-regularized logistic regression. J. Mach. Learn. Res. 3 1519–1555.
  • [21] Manning, C. D. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
  • [22] Meier, L., van de Geer, S. and Bühlmann, P. (2007). The group lasso for logistic regression. Technical report, Mathematics Dept., Swiss Federal Institute of Technology Zürich.
  • [23] Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [24] Ng, A. Y. (2004). Feature selection, 1 vs. 2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML-04). Morgan Kaufmann, San Francisco, CA.
  • [25] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2008). Union support recovery in high-dimensional multivariate regression. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • [26] Ripley, B. D. (1981). Spatial Statistics. Wiley, New York.
  • [27] Rockafellar, G. (1970). Convex Analysis. Princeton Univ. Press, Princeton.
  • [28] Rothman, A., Bickel, P., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • [29] Santhanam, N. P. and Wainwright, M. J. (2008). Information-theoretic limits of high-dimensional graphical model selection. In International Symposium on Information Theory. Toronto, Canada.
  • [30] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction and Search. MIT Press, Cambridge, MA.
  • [31] Srebro, N. (2003). Maximum likelihood bounded tree-width Markov networks. Artificial Intelligence 143 123–138.
  • [32] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inform. Theory 51 1030–1051.
  • [33] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [34] Wainwright, M. J. and Jordan, M. I. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Dept. Statistics, Univ. California, Berkeley.
  • [35] Wainwright, M. J., Ravikumar, P. and Lafferty, J. D. (2007). High-dimensional graphical model selection using 1-regularized logistic regression. In Advances in Neural Information Processing Systems (B. Schölkopf, J. Platt and T. Hoffman, eds.) 19 1465–1472. MIT Press, Cambridge, MA.
  • [36] Welsh, D. J. A. (1993). Complexity: Knots, Colourings, and Counting. Cambridge Univ. Press, Cambridge.
  • [37] Woods, J. (1978). Markov image modeling. IEEE Trans. Automat. Control 23 846–850.
  • [38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • [39] Zhao, P. and Yu, B. (2007). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541–2567.