The Annals of Statistics

Community detection in dense random networks

Ery Arias-Castro and Nicolas Verzelen

Full-text: Open access

Abstract

We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on $N$ nodes. Under the null hypothesis, the graph is a realization of an Erdős–Rényi graph with probability $p_{0}$. Under the (composite) alternative, there is an unknown subgraph of $n$ nodes where the probability of connection is $p_{1}>p_{0}$. We derive a detection lower bound for detecting such a subgraph in terms of $N$, $n$, $p_{0}$, $p_{1}$ and exhibit a test that achieves that lower bound. We do this both when $p_{0}$ is known and unknown. We also consider the problem of testing in polynomial-time. As an aside, we consider the problem of detecting a clique, which is intimately related to the planted clique problem. Our focus in this paper is in the quasi-normal regime where $np_{0}$ is either bounded away from zero, or tends to zero slowly.

Article information

Source
Ann. Statist., Volume 42, Number 3 (2014), 940-969.

Dates
First available in Project Euclid: 20 May 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1400592648

Digital Object Identifier
doi:10.1214/14-AOS1208

Mathematical Reviews number (MathSciNet)
MR3210992

Zentralblatt MATH identifier
1246.62213

Subjects
Primary: 62C20: Minimax procedures 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 94A13: Detection theory

Keywords
Community detection detecting a dense subgraph minimax hypothesis testing Erdős–Rényi random graph scan statistic planted clique problem dense $k$-subgraph problem sparse eigenvalue problem

Citation

Arias-Castro, Ery; Verzelen, Nicolas. Community detection in dense random networks. Ann. Statist. 42 (2014), no. 3, 940--969. doi:10.1214/14-AOS1208. https://projecteuclid.org/euclid.aos/1400592648


Export citation

References

  • Albert, R. and Barabási, A.-L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74 47–97.
  • Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. In Proceedings of the Eighth International Conference “Random Structures and Algorithms” (Poznan, 1997). Random Structures Algorithms 13 457–466.
  • Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
  • Arias-Castro, E. and Verzelen, N. (2013). Community detection in sparse random networks. Available at http://arxiv.org/abs/1308.2955.
  • Arias-Castro, E. and Verzelen, N. (2014). Supplement to “Community detection in dense random networks.” DOI:10.1214/14-AOS1208SUPP.
  • Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.
  • Berthet, Q. and Rigollet, P. (2012). Optimal detection of sparse principal components in high dimension. Available at http://arXiv.org/abs/1202.5070.
  • Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Givan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • Bickel, P. J., Chen, A. and Levina, E. (2011). The method of moments and degree distributions for network models. Ann. Statist. 39 2280–2301.
  • Bollobás, B. (2001). Random Graphs, 2nd ed. Cambridge Studies in Advanced Mathematics 73. Cambridge Univ. Press, Cambridge.
  • Butucea, C. and Ingster, Y. I. (2011). Detection of a sparse submatrix of a high-dimensional noisy matrix. Available at http://arxiv.org/abs/1109.0898.
  • Cai, T. T., Jeng, X. J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 629–662.
  • d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
  • Dekel, Y., Gurel-Gurevich, O. and Peres, Y. (2011). Finding hidden cliques in linear time with high probability. In ANALCO11—Workshop on Analytic Algorithmics and Combinatorics 67–75. SIAM, Philadelphia, PA.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • Feige, U., Kortsarz, G. and Peleg, D. (2001). The dense $k$-subgraph problem. Algorithmica 29 410–421.
  • Feige, U. and Ron, D. (2010). Finding hidden cliques in linear time. In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10). 189–203. Assoc. Discrete Math. Theor. Comput. Sci., Nancy.
  • Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S. and Xiao, Y. (2012). Statistical algorithms and a lower bound for planted clique. Available at http://arXiv.org/abs/1201.1214.
  • Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75–174.
  • Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826 (electronic).
  • Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • Heard, N. A., Weston, D. J., Platanioti, K. and Hand, D. J. (2010). Bayesian anomaly detection methods for social networks. Ann. Appl. Stat. 4 645–662.
  • Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
  • Ingster, Y. I. and Suslina, I. A. (2002). On the detection of a signal with a known shape in a multichannel system. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 294 88–112, 261.
  • Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
  • Karp, R. M. (1972). Reducibility among combinatorial problems. In Complexity of Computer Computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, 1972) 85–103. Plenum, New York.
  • Khuller, S. and Saha, B. (2009). On finding dense subgraphs. In Automata, Languages and Programming. Part I. Lecture Notes in Computer Science 5555 597–608. Springer, Berlin.
  • Kulldorff, M. (1997). A spatial scan statistic. Comm. Statist. Theory Methods 26 1481–1496.
  • Lancichinetti, A. and Fortunato, S. (2009). Community detection algorithms: A comparative analysis. Phys. Rev. E (3) 80 056117.
  • Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • Mongiovı, M., Bogdanov, P., Ranca, R., Papalexakis, E. E., Faloutsos, C. and Singh, A. K. (2013). NetSpot: Spotting significant anomalous regions on dynamic networks. In SIAM International Conference on Data Mining. Austin, TX.
  • Newman, M. E. J. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577–8582.
  • Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113.
  • Park, Y., Priebe, C. E. and Youssef, A. (2013). Anomaly detection in time series of graphs using fusion of graph invariants. IEEE Journal of Selected Topics in Signal Processing 7 67–75.
  • Reichardt, J. and Bornholdt, S. (2006). Statistical mechanics of community detection. Phys. Rev. E (3) 74 016110, 14.
  • Rossman, B. (2010). Average-case complexity of detecting cliques. Ph.D. thesis, MIT, Cambridge, MA.
  • Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • Zuckerman, D. (2006). Linear degree extractors and the inapproximability of max clique and chromatic number. In STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing 681–690. ACM, New York.

Supplemental materials