The Annals of Statistics

Consistency of community detection in networks under degree-corrected stochastic block models

Yunpeng Zhao, Elizaveta Levina, and Ji Zhu

Full-text: Open access

Abstract

Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are stochastically equivalent, and provides a poor fit to networks with hubs or highly varying node degrees within communities, which are common in practice. The degree-corrected stochastic block model was proposed to address this shortcoming and allows variation in node degrees within a community while preserving the overall block community structure. In this paper we establish general theory for checking consistency of community detection under the degree-corrected stochastic block model and compare several community detection criteria under both the standard and the degree-corrected models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice. We find that methods based on the degree-corrected block model, which includes the standard block model as a special case, are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not. On the other hand, in practice, the degree correction involves estimating many more parameters, and empirically we find it is only worth doing if the node degrees within communities are indeed highly variable. We illustrate the methods on simulated networks and on a network of political blogs.

Article information

Source
Ann. Statist., Volume 40, Number 4 (2012), 2266-2292.

Dates
First available in Project Euclid: 23 January 2013

Permanent link to this document
https://projecteuclid.org/euclid.aos/1358951382

Digital Object Identifier
doi:10.1214/12-AOS1036

Mathematical Reviews number (MathSciNet)
MR3059083

Zentralblatt MATH identifier
1257.62095

Subjects
Primary: 62G20: Asymptotic properties

Keywords
Community detection degree-corrected stochastic block models consistency

Citation

Zhao, Yunpeng; Levina, Elizaveta; Zhu, Ji. Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 (2012), no. 4, 2266--2292. doi:10.1214/12-AOS1036. https://projecteuclid.org/euclid.aos/1358951382


Export citation

References

  • [1] Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US Election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery 36-43. ACM, New York.
  • [2] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
  • [3] Airoldi, E. M. and Choi, D. (2011). Summary of proof in “A nonparametric view of network models and Newman–Girvan and other modularities.” Personal communication.
  • [4] Beasley, J. E. (1998). Heuristic algorithms for the unconstrained binary quadratic programming problem. Technical report, Management School, Imperial College, London, UK.
  • [5] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • [6] Bickel, P. J. and Chen, A. (2012). Weak consistency of community detection criteria under the stochastic block model. Unpublished manuscript.
  • [7] Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with growing number of classes. Biometrika 99 273–284.
  • [8] Coja-Oghlan, A. and Lanka, A. (2010). Finding planted partitions in random graphs with general degree distributions. SIAM J. Discrete Math. 23 1682–1714.
  • [9] Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems 1695.
  • [10] Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2012). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84 066106.
  • [11] Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75–174.
  • [12] Fruchterman, T. M. J. and Reingold, E. M. (1991). Graph drawing by force-directed placement. Software: Practice and Experience 21 1129–1164.
  • [13] Getoor, L. and Diehl, C. P. (2005). Link mining: A survey. ACM SIGKDD Explorations Newsletter 7 3–12.
  • [14] Glover, F. W. and Lagunas, M. (1997). Tabu Search. Kluwer Academic, Norwell.
  • [15] Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning 2 129–233.
  • [16] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301–354.
  • [17] Hoff, P. D. (2007). Modeling homophily and stochastic equivalence in symmetric relational data. In Advances in Neural Information Processing Systems, 19 MIT Press, Cambridge, MA.
  • [18] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
  • [19] Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193–218.
  • [20] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107.
  • [21] McCulloch, C. E. and Searle, S. R. (2001). Generalized, Linear, and Mixed Models. Wiley-Interscience, New York.
  • [22] Newman, M. E. J. (2004). Detecting community structure in networks. Eur. Phys. J. B 38 321–330.
  • [23] Newman, M. E. J. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577–8582.
  • [24] Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 036104, 19.
  • [25] Newman, M. E. J. (2010). Networks: An Introduction. Oxford Univ. Press, Oxford.
  • [26] Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E 69 026113.
  • [27] Newman, M. E. J. and Leicht, E. A. (2007). Mixture models and exploratory analysis in networks. Proc. Natl. Acad. Sci. USA 104 9564–9569.
  • [28] Ng, A., Jordan, M. and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Neural Information Processing Systems 14 (T. Dietterich, S. Becker and Z. Ghahramani, eds.) 849–856. MIT Press, Cambridge.
  • [29] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.
  • [30] Perry, P. O. and Wolfe, P. J. (2012). Null models for network data. Available at arXiv:1201.5871v1.
  • [31] Robins, G., Snijders, T., Wang, P., Handcock, M. and Pattison, P. (2007). Recent developments in exponential random graphs models ($p^\ast$) for social networks. Social Networks 29 192–215.
  • [32] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • [33] Schlitt, T. and Brazma, A. (2007). Current approaches to gene regulatory network modelling. BMC Bioinformatics 8 S9. Suppl 6.
  • [34] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence 22 888–905.
  • [35] Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.
  • [36] Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. J. Amer. Statist. Assoc. 82 8–19.
  • [37] Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge Univ. Press, Cambridge.
  • [38] Wei, Y. C. and Cheng, C. K. (1989). Toward efficient hierarchical designs by ratio cut partitioning. In Proceedings of the IEEE International Conference on Computer Aided Design 298–301. IEEE, New York.
  • [39] Zhang, S. and Zhao, H. (2012). Community identification in networks with unbalanced structure. Phys. Rev. E 85 066114.
  • [40] Zhao, Y., Levina, E. and Zhu, J. (2011). Community extraction for social networks. Proc. Natl. Acad. Sci. USA 108 7321–7326.
  • [41] Zhao, Y., Levina, E. and Zhu, J. (2012). Supplement to “Consistency of community detection in networks under degree-corrected stochastic block models.” DOI:10.1214/12-AOS1036SUPP.

Supplemental materials