Electronic Journal of Statistics

Bayesian degree-corrected stochastic blockmodels for community detection

Lijun Peng and Luis Carvalho

Full-text: Open access

Abstract

Community detection in networks has drawn much attention in diverse fields, especially social sciences. Given its significance, there has been a large body of literature with approaches from many fields. Here we present a statistical framework that is representative, extensible, and that yields an estimator with good properties. Our proposed approach considers a stochastic blockmodel based on a logistic regression formulation with node correction terms. We follow a Bayesian approach that explicitly captures the community behavior via prior specification. We further adopt a data augmentation strategy with latent Pólya-Gamma variables to obtain posterior samples. We conduct inference based on a principled, canonically mapped centroid estimator that formally addresses label non-identifiability and captures representative community assignments. We demonstrate the proposed model and estimation on real-world as well as simulated benchmark networks and show that the proposed model and estimator are more flexible, representative, and yield smaller error rates when compared to the MAP estimator from classical degree-corrected stochastic blockmodels.

Article information

Source
Electron. J. Statist., Volume 10, Number 2 (2016), 2746-2779.

Dates
Received: December 2014
First available in Project Euclid: 20 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1474373834

Digital Object Identifier
doi:10.1214/16-EJS1163

Mathematical Reviews number (MathSciNet)
MR3549018

Zentralblatt MATH identifier
06628776

Keywords
Community detection label non-identifiability canonical remapping centroid estimation

Citation

Peng, Lijun; Carvalho, Luis. Bayesian degree-corrected stochastic blockmodels for community detection. Electron. J. Statist. 10 (2016), no. 2, 2746--2779. doi:10.1214/16-EJS1163. https://projecteuclid.org/euclid.ejs/1474373834


Export citation

References

  • Adamic, L. and N. Glance (2005). The political blogosphere and the 2004 US election: Divided they blog. In, Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43.
  • Airoldi, E. M., D. M. Blei, S. E. Fienberg, and E. P. Xing (2008). Mixed membership stochastic blockmodels., The Journal of Machine Learning Research 9, 1981–2014.
  • Albert, R. and A.-L. Barabási (2002, Jan). Statistical mechanics of complex networks., Rev. Mod. Phys. 74, 47–97.
  • Anderson, C., S. Wasserman, and K. Faust (1992). Building stochastic blockmodels., Social Networks 14(1), 137–161.
  • Barbieri, M. and J. Berger (2004). Optimal predictive model selection., The Annals of Statistics 32(3), 870–897.
  • Barnes, E. (1982). An algorithm for partitioning the nodes of a graph., SIAM Journal on Algebraic Discrete Methods 3(4), 541–550.
  • Besag, J. (1986). On the statistical analysis of dirty pictures., Journal of the Royal Statistical Society. Series B (Methodological) 48(3), 259–302.
  • Bickel, P. and A. Chen (2009). A nonparametric view of network models and Newman–Girvan and other modularities., Proceedings of the National Academy of Sciences 106(50), 21068–21073.
  • Bickel, P., D. Choi, X. Chang, and H. Zhang (2013, 08). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels., The Annals of Statistics 41(4), 1922–1943.
  • Binder, D. A. (1978). Bayesian cluster analysis., Biometrika 65(1), 31–38.
  • Binder, D. A. (1981). Approximations to Bayesian clustering rules., Biometrika 68(1), 275–285.
  • Blondel, V. D., J.-L. Guillaume, R. Lambiotte, and E. Lefebvre (2008). Fast unfolding of communities in large networks., Journal of Statistical Mechanics: Theory and Experiment 2008(10), P10008.
  • Brandes, U., D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, and D. Wagner (2007). On finding graph clusterings with maximum modularity. In, Graph-Theoretic Concepts in Computer Science, pp. 121–132. Springer.
  • Carvalho, L. and C. Lawrence (2008). Centroid estimation in discrete high-dimensional spaces with applications in biology., Proceedings of the National Academy of Sciences 105(9), 3209–3214.
  • Celisse, A., J.-J. Daudin, and L. Pierre (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model., Electronic Journal of Statistics 6, 1847–1899.
  • Choi, D. S., P. J. Wolfe, and E. M. Airoldi (2012). Stochastic blockmodels with a growing number of classes., Biometrika.
  • Clauset, A., M. E. J. Newman, and C. Moore (2004, August). Finding community structure in very large networks., Physical Review E 70(6), 066111.
  • Danon, L., A. Díaz-Guilera, and J. Duch (2005). Comparing community structure identification., Journal of Statistical Mechanics: Theory and Experiment, 09008.
  • Daudin, J. J., F. Picard, and S. Robin (2008, June). A mixture model for random graphs., Statistics and Computing 18(2), 173–183.
  • Donath, E. and J. Hoffman (1973). Lower bounds for the partitioning of graphs., IBM J. Res. Dev. 17(5), 420–425.
  • Duch, J. and A. Arenas (2005). Community identification using extremal optimization., Physical Review E 72, 027104.
  • Fienberg, S. E., M. M. Meyer, and S. S. Wasserman (1985). Statistical analysis of multiple sociometric relations., Journal of the American Statistical Association 80(389), 51–67.
  • Fienberg, S. E. and S. Wasserman (1981). An exponential family of probability distributions for directed graphs: Comment., Journal of the American Statistical Association 76(373), 54–57.
  • Fortunato, S. and M. Barthelemy (2007). Resolution limit in community detection., Proceedings of the National Academy of Sciences 104(1), 36–41.
  • Fosdick, B. and P. Hoff (2013). Testing and modeling dependencies between a network and nodal attributes., arXiv:1306.4708v1.
  • Fritsch, A. and K. Ickstadt (2009). Improved criteria for clustering based on the posterior similarity matrix., Bayesian Analysis 4(2), 367–392.
  • Gelfand, A. E. and S. K. Ghosh (1998). Model choice: a minimum posterior predictive loss approach., Biometrika 85(1), 1–11.
  • Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (2003)., Bayesian Data Analysis. CRC press.
  • Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images., IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741.
  • Goodnight, J. H. (1979). A tutorial on the sweep operator., The American Statistician 33(3), 149–158.
  • Gower, J. (1966). Some distance properties of latent root and vector methods used in multivariate analysis., Biometrika 53(3-4), 325–338.
  • Hancock, T., I. Takigawa, and H. Mamitsuka (2010). Mining metabolic pathways through gene expression., Bioinformatics 26(17), 2128–2135.
  • Handcock, M., A. Raftery, and J. Tantrum (2007). Model-based clustering for social networks., Journal of the Royal Statistical Society: Series A 170(2), 301–354.
  • Hastie, T., R. Tibshirani, and J. Friedman (2001). Maximum likelihood from incomplete data via the em algorithm., The Elements of Statistical Learning, 520–528.
  • Hoff, P., A. Raftery, and M. Handcock (2002). Latent space approaches to social network analysis., Journal of the American Statistical Association 97(460), 1090–1098.
  • Hoff, P., A. Raftery, and M. Handcock (2005). Bilinear mixed-effects models for dyadic data., Journal of the American Statistical Association 100(469), 286–295.
  • Hofman, J. and C. Wiggins (2008). Bayesian approach to network modularity., Physical Review Letters 100(25), 258701.
  • Holland, P. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs., Journal of the American Statistical Association 76(373), 33–50.
  • Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps., Social networks 5(2), 109–137.
  • Jin, J. (2015). Fast community detection by SCORE., The Annals of Statistics 43(1), 57–89.
  • Karrer, B. and M. Newman (2011). Stochastic blockmodels and community structure in networks., Physical Review E 83(1), 016107.
  • Kernighan, B. and S. Lin (1970). An efficient heuristic procedure for partitioning graphs., Bell Sys. Tech. J. 49(2), 291–308.
  • Kim, M. and J. Leskovec (2011). Modeling social networks with node attributes using the multiplicative attribute graph model., UAI 7AUAI Press, 400–409.
  • Lancichinetti, A., S. Fortunato, and F. Radicchi (2008). Benchmark graphs for testing community detection algorithms., Physical Review E 78(1), 046110.
  • Lau, J. W. and P. J. Green (2007). Bayesian model-based clustering procedures., Journal of Computational and Graphical Statistics 16(3), 526–558.
  • Lorrain, F. and H. C. White (1971). Structural equivalence of individuals in social networks., The Journal of Mathematical Sociology 1(1), 49–80.
  • Mariadassou, M., S. Robin, and C. Vacher (2010, 06). Uncovering latent structure in valued graphs: A variational approach., The Annals of Applied Statistics 4(2), 715–742.
  • McCullagh, P. and J. A. Nelder (1989)., Generalized Linear Models, Volume 37. CRC Press.
  • Newman, M. (2002, Oct). Assortative mixing in networks., Phys. Rev. Lett. 89, 208701.
  • Newman, M. (2004). Fast algorithm for detecting community structure in networks., Physical Review E 69(6), 066133.
  • Newman, M. (2006). Modularity and community structure in networks., Proceedings of the National Academy of Sciences 103(23), 8577–8582.
  • Newman, M. and M. Girvan (2004). Finding and evaluating community structure in networks., Physical Review E 69(2), 026113.
  • Newman, M. E. J. (2003). Mixing patterns in networks., Phys. Rev. E (67).
  • Nocedal, J. and S. J. Wright (2006)., Numerical Optimization (2nd ed.). Springer-Verlag.
  • Nowicki, K. and T. A. B. Snijders (2001). Estimation and prediction for stochastic blockstructures., Journal of the American Statistical Association 96(455), 1077–1087.
  • Parthasarathy, S., Y. Ruan, and V. Satuluri (2011). Community discovery in social networks: Applications, methods and emerging trends. In, Social Network Data Analytics, pp. 79–113. Springer.
  • Peng, L. and Carvalho, L. (2016). Supplementary material for “Bayesian degree-corrected stochastic blockmodels for community detection”. DOI:, 10.1214/16-EJS1163SUPP.
  • Polson, N. G., J. G. Scott, and J. Windle (2012). Bayesian inference for logistic models using polya-gamma latent variables., arXiv:1205.0310.
  • Pons, P. and M. Latapy (2004). Computing communities in large networks using random walks., J. of Graph Alg. and App. 10, 284–293.
  • Qin, T. and K. Rohe (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In, Advances in Neural Information Processing Systems, pp. 3120–3128.
  • Raghavan, U. N., R. Albert, and S. Kumara (2007). Near linear time algorithm to detect community structures in large-scale networks., Physical Review E 76(3).
  • Robert, C. and G. Casella (1999)., Monte Carlo Statistical Methods. Springer New York.
  • Robins, G., P. Pattison, Y. Kalish, and D. Lusher (2007). An introduction to exponential random graph ($p^*$) models for social networks., Social networks 29(2), 173–191.
  • Rohe, K., S. Chatterjee, and B. Yu (2011). Spectral clustering and the high-dimensional stochastic blockmodel., The Annals of Statistics 39(4), 1878–1915.
  • Sampson, S. F. (1968)., A novitiate in a period of change: an experimental and case study of social relationships. Ph. D. thesis, Cornell University, September.
  • Snijders, T. A. and K. Nowicki (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure., Journal of Classification 14(1), 75–100.
  • Stephens, M. (2000). Dealing with label switching in mixture models., Journal of the Royal Statistical Society. Series B 62(4), 795–809.
  • Tallberg, C. (2005). A Bayesian approach to modeling stochastic blockstructures with covariates., Journal of Mathematical Sociology 29, 1–23.
  • Vázquez, A. (2003). Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations., Physical Review E 67(5), 056104.
  • Von Luxburg, U. (2007). A tutorial on spectral clustering., Statistics and computing 17(4), 395–416.
  • Vu, D. Q., D. R. Hunter, and M. Schweinberger (2013, 06). Model-based clustering of large networks., Annals of Applied Statistics 7(2), 1010–1039.
  • Wang, Y. J. and G. Y. Wong (1987). Stochastic blockmodels for directed graphs., Journal of the American Statistical Association 82(397), 8–19.
  • Yan, X., C. Shalizi, J. E. Jensen, F. Krzakala, C. Moore, L. Zdeborová, P. Zhang, and Y. Zhu (2014). Model selection for degree-corrected block models., Journal of Statistical Mechanics: Theory and Experiment, P05007.
  • Zachary, W. W. (1977). An information flow model for conflict and fission in small groups., Journal of Anthropological Research 33(4), 452–473.
  • Zanghi, H., F. Picard, V. Miele, and C. Ambroise (2010, 06). Strategies for online inference of model-based clustering in large and growing networks., The Annals of Applied Statistics 4(2), 687–714.
  • Zhao, Y., E. Levina, J. Zhu, et al. (2012). Consistency of community detection in networks under degree-corrected stochastic block models., The Annals of Statistics 40(4), 2266–2292.

Supplemental materials