## Electronic Journal of Statistics

### Nonparametric link prediction in large scale dynamic networks

#### Abstract

We propose a nonparametric approach to link prediction in large-scale dynamic networks. Our model uses graph-based features of pairs of nodes as well as those of their local neighborhoods to predict whether those nodes will be linked at each time step. The model allows for different types of evolution in different parts of the graph (e.g, growing or shrinking communities). We focus on large-scale graphs and present an implementation of our model that makes use of locality-sensitive hashing to allow it to be scaled to large problems. Experiments with simulated data as well as five real-world dynamic graphs show that we outperform the state of the art, especially when sharp fluctuations or nonlinearities are present. We also establish theoretical properties of our estimator, in particular consistency and weak convergence, the latter making use of an elaboration of Stein’s method for dependency graphs.

#### Article information

Source
Electron. J. Statist., Volume 8, Number 2 (2014), 2022-2065.

Dates
First available in Project Euclid: 29 October 2014

https://projecteuclid.org/euclid.ejs/1414588186

Digital Object Identifier
doi:10.1214/14-EJS943

Mathematical Reviews number (MathSciNet)
MR3273618

Zentralblatt MATH identifier
1302.62096

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 91D30: Social networks

#### Citation

Sarkar, Purnamrita; Chakrabarti, Deepayan; Jordan, Michael. Nonparametric link prediction in large scale dynamic networks. Electron. J. Statist. 8 (2014), no. 2, 2022--2065. doi:10.1214/14-EJS943. https://projecteuclid.org/euclid.ejs/1414588186

#### References

• [1] Adamic, L. and Adar, E., Friends and neighbors on the web., Social Networks, 25:211–230, 2003.
• [2] Aitchison, J. and Aitken, C. G. G., Multivariate binary discrimination by the kernel method., Biometrika, 63:413–420, 1976.
• [3] Bradley, R. C., Basic properties of strong mixing conditions. A survey and some open questions., Probability Surveys, 2:107–144, 2005.
• [4] Chakrabarti, D., Faloutsos, C., and Zhan, Y., Visualization of large networks with min-cut plots, a-plots and r-mat., Int. J. Hum.-Comput. Stud., 65(5):434–445, 2007.
• [5] Chen, L. H. Y., Goldstein, L., and Shao, Q. M., Normal Approximation by Stein’s Method. Springer Verlag, 2010.
• [6] Durrett, R., Probability: Theory and Examples. Duxbury Press, 1995.
• [7] Levina, E. and Bickel, P. J., Thexture synthesis and nonparametric resampling of random fields., Annals of Statistics, 34(4) :1751–1773, 2006.
• [8] Fu, W., Xing, E. P., and Song, L., A state-space mixed membership blockmodel for dynamic network tomography., Annals of Applied Statistics, 4:535–566, 2010.
• [9] Grimmett, G. and Stirzaker, D., Probability and Random Processes. Oxford University Press, 2001.
• [10] Hanneke, S. and Xing, E. P., Discrete temporal models of social networks., Electronic Journal of Statistics, 4:585–605, 2006.
• [11] Heidergott, B., Hordijk, A., and van Uitert, M., Series expansions for finite-state Markov chains. Tinbergen Institute Discussion Papers 05-086/4, 2005.
• [12] Hoff, P. D., Latent factor models for relational data. URL, http://www.stat.washington.edu/hoff/public/acms.pdf.
• [13] Holland, P. W. and Leinhardt, S., A dynamic model for social networks., Journal of Mathematical Sociology, 5:5–20, 1977.
• [14] Huang, Z. and Lin, D. K. J., The time-series link prediction problem with applications in communication surveillance., INFORMS Journal on Computing, 2009.
• [15] Indyk, P. and Motwani, R., Approximate nearest neighbors: Towards removing the curse of dimensionality. In, ACM Symposium on Theory of Computing. MIT Press, 1998.
• [16] Katz, L., A new status index derived from sociometric analysis. In, Psychometrika, volume 18, pages 39–43, 1953.
• [17] Kolar, M., Song, L., Ahmed, A., and Xing, E., Estimating time-varying networks., Annals of Applied Statistics, 2010.
• [18] Liben-Nowell, D. and Kleinberg, J., The link prediction problem for social networks. In, Conference on Information and Knowledge Management. ACM, 2003.
• [19] Masry, E. and Tjøstheim, D., Nonparametric estimation and identification of nonlinear ARCH time series., Econometric Theory, 11:258–289, 1995.
• [20] Paparoditis, E. and Dimitris, N. P., The local bootstrap for markov processes., J. Statist. Plann. Inference, 108:301–328, 2002.
• [21] Pham, D., The mixing property of bilinear and generalised random coefficient autoregressive models., Stochastic Processes and Their Applications, 23:291–300, 1986.
• [22] Politis, D., Romano, J., and Wolf, M., Subsampling. Springer, 1999.
• [23] Politis, D. N. and Romano, J. P., Nonparametric resampling for homogeneous strong mixing random fields., Journal of Multivariate Analysis, 47(2):301–328, 1993.
• [24] Raftery, A. E., Handcock, M. S., and Hoff, P. D., Latent space approaches to social network analysis., Journal of the American Statistical Association, 15:460, 2002.
• [25] Richard, E., Gaiffas, S., and Vayatis, N., Link prediction in graphs with autoregressive features. In P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2843–2851, 2012.
• [26] Rinott, Y. and Rotar, V., A multivariate CLT for local dependence with $n^-1/2\log n$ rate and applications to multivariate graph related statistics., Journal of Multivariate Analysis, 56(2):333–350, 1996.
• [27] Sarkar, P., Chen, L., and Dubrawski, A., Dynamic network model for predicting occurrences of salmonella at food facilities. In, Biosurveillance and Biosecurity: International Workshop, BioSecure. Springer, 2008.
• [28] Sarkar, P. and Moore, A., Dynamic social network analysis using latent space models. In, Advances in Neural Information Processing Systems. 2005.
• [29] Snijders, T. and Nowicki, K., Estimation and prediction for stochastic blockmodels for graphs with latent block structure., Journal of Classification, 1997.
• [30] Sunklodas, J., On normal approximation for strongly mixing random variables., Acta Applicandae Mathematicae, 97:251–260, 2007.
• [31] Tylenda, T., Angelova, R., and Bedathur, S., Towards time-aware link prediction in evolving social networks. In, ACM Workshop on Social Network Mining and Analysis. ACM, 2009.
• [32] Vu, D., Asuncion, A., Hunter, D., and Smyth, P., Continuous-time regression models for longitudinal networks. In, Advances in Neural Information Processing Systems. MIT Press, 2011.
• [33] Wang, M. C. and van Ryzin, J., A class of smooth estimators for discrete distributions., Biometrika, 1981.
• [34] Wilson, E., Probable inference, the law of succession, and statistical inference., Journal of the American Statistical Association, 22:209–212, 1927.
• [35] Zhou, S., Lafferty, J., and Wasserman, L., Time varying undirected graphs. In, Conference on Learning Theory, 2008.