Electronic Journal of Statistics

Estimating a network from multiple noisy realizations

Can M. Le, Keith Levin, and Elizaveta Levina

Full-text: Open access

Abstract

Complex interactions between entities are often represented as edges in a network. In practice, the network is often constructed from noisy measurements and inevitably contains some errors. In this paper we consider the problem of estimating a network from multiple noisy observations where edges of the original network are recorded with both false positives and false negatives. This problem is motivated by neuroimaging applications where brain networks of a group of patients with a particular brain condition could be viewed as noisy versions of an unobserved true network corresponding to the disease. The key to optimally leveraging these multiple observations is to take advantage of network structure, and here we focus on the case where the true network contains communities. Communities are common in real networks in general and in particular are believed to be presented in brain networks. Under a community structure assumption on the truth, we derive an efficient method to estimate the noise levels and the original network, with theoretical guarantees on the convergence of our estimates. We show on synthetic networks that the performance of our method is close to an oracle method using the true parameter values, and apply our method to fMRI brain data, demonstrating that it constructs stable and plausible estimates of the population network.

Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 4697-4740.

Dates
Received: October 2017
First available in Project Euclid: 22 December 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1545448230

Digital Object Identifier
doi:10.1214/18-EJS1521

Mathematical Reviews number (MathSciNet)
MR3894068

Zentralblatt MATH identifier
07003255

Subjects
Primary: 62H12: Estimation
Secondary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62F12: Asymptotic properties of estimators

Keywords
Noisy networks stochastic block model brain networks EM algorithm

Rights
Creative Commons Attribution 4.0 International License.

Citation

Le, Can M.; Levin, Keith; Levina, Elizaveta. Estimating a network from multiple noisy realizations. Electron. J. Statist. 12 (2018), no. 2, 4697--4740. doi:10.1214/18-EJS1521. https://projecteuclid.org/euclid.ejs/1545448230


Export citation

References

  • [1] E. Abbe. Community detection and stochastic block models: Recent developments., Journal of Machine Learning Research, 18(177):1–86, 2018.
  • [2] A. A. Amini, A. Chen, P. J. Bickel, and E. Levina. Fitting community models to large sparse networks., Annals of Statistics, 41(4) :2097–2122, 2013.
  • [3] S. Balakrishnan, M. J. Wainwright, and B. Yu. Statistical guarantees for the EM algorithm: From population to sample-based analysis., Annals of Statistics, 45(1):77–120, 2017.
  • [4] D. S. Bassett, B. G. Nelson, B. A. Mueller, J. Camchong, and K. O. Lim. Altered resting state complexity in schizophrenia., Neuroimage, 59(3) :2196–2207, 2012.
  • [5] P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities., Proc. Natl. Acad. Sci. USA, 106 :21068–21073, 2009.
  • [6] P. J. Bickel and K. A. Doksum., Mathematical statistics: Basic ideas and selected topics–2nd edition (updated printing), volume 1. Pearson Prentice Hall, 2007.
  • [7] S. Boucheron, G. Lugosi, and P. Massart., Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, 2013.
  • [8] R. L. Buckner, J. Sepulcre, T. Talukdar, F. Krienen, H. Liu, T. Hedden, J. R. Andrews-Hanna, R. A. Sperling, and K. A. Johnson. Cortical hubs revealed by intrinsic functional connectivity: Mapping, assessment of stability, and relation to alzheimer’s disease., J Neurosci, 29(6) :1860–1873, 2009.
  • [9] K. Chen and J. Lei. Network cross-validation for determining the number of communities in network data., arXiv :1411.1715, 2014.
  • [10] P. Chin, A. Rao, and V. Vu. Stochastic block model and community detection in the sparse graphs: A spectral algorithm with optimal rate of recovery., arXiv :1501.05021, 2015.
  • [11] D. Choi and P. Wolfe. Co-clustering separately exchangeable network data., Annals of Statistics, 42(1):29–63, 2014.
  • [12] A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm., Applied statistics, pages 20–28, 1979.
  • [13] S. Fortunato. Community detection in graphs., Physics Reports, 486(3-5):75 – 174, 2010.
  • [14] C. Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou. Achieving optimal misclassification proportion in stochastic block model., arXiv :1505.03772, 2015.
  • [15] A. Ghosh, S. Kale, and R. P. McAfee. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content. In, EC, 2011.
  • [16] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi. A survey of statistical network models., Foundations and Trends in Machine Learning, 2:129–233, 2010.
  • [17] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: first steps., Social Networks, 5(2):109–137, 1983.
  • [18] S. H. Hosseini and S. R. Kesler. Comparing connectivity pattern and small-world organization between structural correlation and resting-state networks in healthy adults., Neuroimage, 78:402–414, 2013.
  • [19] A. Joseph and B. Yu. Impact of regularization on spectral clustering., Ann. Statist., 44(4) :1765–1791, 2016.
  • [20] B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks., Physical Review E, 83 :016107, 2011.
  • [21] F. Klimm, D. S. Bassett, J. M. Carlson, and P. J. Mucha. Resolving structural variability in network models and the brain., PLoS Comput Biol, 10(3): e1003491, 2014.
  • [22] V. Koltchinskii., Oracle inequalities in empirical risk minimization and sparse recovery problems. Springer Verlag, 2011.
  • [23] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, and P. Zhang. Spectral redemption in clustering sparse networks., Proceedings of the National Academy of Sciences, 110(52) :20935–20940, 2013.
  • [24] C. M. Le and E. Levina. Estimating the number of communities in networks by spectral methods., arXiv :1507.00827, 2015.
  • [25] C. M. Le, E. Levina, and R. Vershynin. Sparse random graphs: regularization and concentration of the laplacian., arXiv :1502.03049, 2015.
  • [26] C. M. Le, E. Levina, and R. Vershynin. Concentration and regularization of random graphs., Random Struct. Alg., doi: 10.1002/rsa.20713, 2017.
  • [27] M. Ledoux and M. Talagrand., Probability in Banach spaces: Isoperimetry and processes. Springer-Verlag, Berlin, 1991.
  • [28] K. Levin, A. Athreya, M. Tang, V. Lyzinski, and C. E. Priebe. A central limit theorem for an omnibus embedding of random dot product graphs., arXiv :1705.09355, 2017.
  • [29] Q. Liu, J. Peng, and A. T. Ihler. Variational inference for crowdsourcing., Advances in neural information processing systems, pages 692–700, 2012.
  • [30] V. Lyzinski, D. L. Sussman, M. Tang, A. Athreya, and C. E. Priebe. Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding., Electronic Journal of Statistics, 8(2) :2905–2922, 2014.
  • [31] M. Narayan and G. I. Allen. Mixed effects models for resampled network statistics improves statistical power to find differences in multi-subject functional connectivity., Frontiers in Neuroscience, 10(108), 2016.
  • [32] M. Narayan, G. I. Allen, and S. Tomson. Two sample inference for populations of graphical models with applications to functional connectivity., arXiv :1502.03853, 2015.
  • [33] M. E. J. Newman. Fast algorithm for detecting community structure in networks., Phys. Rev. E, 69(6) :066133, 2004.
  • [34] M. E. J. Newman. Modularity and community structure in networks., Proc. Natl. Acad. Sci. USA, 103(23) :8577–8582, 2006.
  • [35] S. C. Olhede and P. J. Wolfe. Network histograms and universality of blockmodel approximation., Proceedings of the National Academy of Sciences of the USA, 111 :14722–14727, 2014.
  • [36] J. D. Power, A. L. Cohen, S. M. Nelson, G. S. Wig, K. A. Barnes, J. A. Church, A. C. Vogel, T. O. Laumann, F. M. Miezin, B. L. Schlaggar, and S. E. Petersen. Functional network organization of the human brain., Neuron, 72(4):665–78, 2011.
  • [37] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds., Journal of Machine Learning Research, 11 :1297–1322, 2010.
  • [38] M. Rubinov and O. Sporns. Complex network measures of brain connectivity: uses and interpretations., NeuroImage, 52(3) :1059–1069, 2010.
  • [39] R. Tang, M. Ketcha, J. T. Vogelstein, C. E. Priebe, and D. L. Sussman. Law of large graphs., arXiv :1609.01672, 2016.
  • [40] S. Taylor, A. Chen, I. Tso, I. Liberzon, and R. Welsh. Social appraisal in chronic psychosis: Role of medial frontal and occipital networks., J. Psychiatr. Res., 45(4):526–538, 2011.
  • [41] S. Taylor, E. Demeter, K. Phan, I. Tso, and R. Welsh. Abnormal gabaergic function and negative affect in schizophrenia., Neuropsychopharmacology, 39(4) :1000–1008, 2014.
  • [42] J. A. Tropp. User-friendly tail bounds for sums of random matrices., Foundations of Computational Mathematics, 12(4):389–434, 2012.
  • [43] L. Wang, Z. Zhang, and D. Dunson. Common and individual structure of multiple networks., arXiv :1707.06360, 2017.
  • [44] R. Wang and P. Bickel. Likelihood-based model selection for stochastic block models., arXiv :1502.02069, 2015.
  • [45] M. Xia, J. Wang, and Y. He. Brainnet viewer: A network visualization tool for human brain connectomics., PLoS ONE 8: e68910, 2013.
  • [46] Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Spectral methods meet em: A provably optimal algorithm for crowdsourcing., Advances in neural information processing systems, page 1260–1268, 2014.