## The Annals of Applied Statistics

### Modeling node incentives in directed networks

Deepayan Chakrabarti

#### Abstract

Twitter is a popular medium for individuals to gather information and express opinions on topics of interest to them. By understanding who is interested in what topics, we can gauge the public mood, especially during periods of polarization such as elections. However, while the total volume of tweets may be huge, many people tweet rarely, and tweets are short and often noisy. Hence, directly inferring topics from tweets is both complicated and difficult to scale. Instead, the network structure of Twitter (who tweets at whom, who follows whom) can telegraph the interests of Twitter users. We propose the Producer-Consumer Model (PCM) to link latent topical interests of individuals to the directed structure of the network. A key component of PCM is the modeling of incentives of Twitter users. In particular, for a user to attract more followers and become popular, she must strive to be perceived as an expert on some topic. We use this to reduce the parameter space of PCM, greatly increasing its scalability. We apply PCM to track the evolution of Twitter topics during the Italian Elections of $2013$, and also to interpret those topics using hashtags. A secondary application of PCM to a citation network of machine learning papers is also shown. Extensive simulations and experiments with large real-world datasets demonstrate the accuracy and scalability of PCM.

#### Article information

Source
Ann. Appl. Stat., Volume 11, Number 4 (2017), 2298-2331.

Dates
Revised: May 2017
First available in Project Euclid: 28 December 2017

https://projecteuclid.org/euclid.aoas/1514430287

Digital Object Identifier
doi:10.1214/17-AOAS1079

Mathematical Reviews number (MathSciNet)
MR3743298

Zentralblatt MATH identifier
1383.62331

#### Citation

Chakrabarti, Deepayan. Modeling node incentives in directed networks. Ann. Appl. Stat. 11 (2017), no. 4, 2298--2331. doi:10.1214/17-AOAS1079. https://projecteuclid.org/euclid.aoas/1514430287

#### References

• Adamic, L. and Adar, E. (2003). Friends and neighbors on the Web. Soc. Netw. 25 211–230.
• Aiello, W., Chung, F. and Lu, L. (2000). A random graph model for massive graphs. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing 171–180. ACM, New York.
• Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
• Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
• Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Soc. Netw. 33 41–55. DOI:10.1016/j.socnet.2010.09.004.
• Caldarelli, G., Chessa, A., Pammolli, F., Pompa, G., Puliga, M., Riccaboni, M. and Riotta, G. (2014). A multi-level geographical study of Italian political elections from Twitter data. PLoS ONE 9 e95809.
• Caragea, C., Wu, J., Ciobanu, A., Williams, K., Fernández-Ramírez, J., Chen, H.-H., Wu, Z. and Giles, L. (2014). CiteSeerX: A scholarly big dataset. In Proceedings of the 36th European Conference on Information Retrieval (ECIR’14) 311–322.
• Chakrabarti, D. (2017). Supplement to “Modeling node incentives in directed networks.” DOI:10.1214/17-AOAS1079SUPP.
• Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38 Article No. 2. DOI:10.1145/1132952.1132954.
• Chakrabarti, D., Zhan, Y. and Faloutsos, C. (2004). R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM’04) 442–446.
• Chang, J. (2012). lda: Collapsed Gibbs sampling methods for topic models. Available at https://cran.r-project.org/web/packages/lda/index.html.
• Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435. DOI:10.1214/10-AAP728.
• Duijn, M. A., Snijders, T. A. and Zijlstra, B. J. (2004). P2: A random effects model with covariates for directed graphs. Stat. Neerl. 58 234–254.
• Erdős, P. and Rényi, A. (1959). On random graphs. I. Publ. Math. Debrecen 6 290–297.
• Fosdick, B. K. and Hoff, P. D. (2015). Testing and modeling dependencies between a network and nodal attributes. J. Amer. Statist. Assoc. 110 1047–1056.
• Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832–842.
• Fu, W., Song, L. and Xing, E. P. (2009). Dynamic mixed membership blockmodel for evolving networks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09) 329–336.
• Gehrke, J., Ginsparg, P. and Kleinberg, J. (2003). Overview of the 2003 KDD cup. ACM SIGKDD Explor. Newsl. 5 149–151. DOI:10.1145/980972.980992.
• Gilbert, E. N. (1959). Random graphs. Ann. Math. Stat. 30 1141–1144.
• Gopalan, P. K. and Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks. Proc. Natl. Acad. Sci. USA 110 14534–14539.
• Handcock, M. S. and Jones, J. H. (2004). Likelihood-based inference for stochastic models of sexual network formation. Theor. Popul. Biol. 65 413–422. DOI:10.1016/j.tpb.2003.09.006.
• Hoff, P. D. (2005). Bilinear mixed-effects models for dyadic data. J. Amer. Statist. Assoc. 100 286–295.
• Hoff, P. D. (2009). Multiplicative latent factor models for description and prediction of social networks. Comput. Math. Organ. Theory 15 261–272.
• Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.
• Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99) 50–57.
• Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22 89–115. DOI:10.1145/963770.963774.
• Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
• Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65.
• Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.
• Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107.
• Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika 18 39–43.
• Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI’06) 381–388.
• Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08) 426–434.
• Krivitsky, P. N., Handcock, M. S., Raftery, A. E. and Hoff, P. D. (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc. Netw. 31 204–213. DOI:10.1016/j.socnet.2009.04.001.
• Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
• Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. and Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 985–1042.
• Miller, K. T., Griffiths, T. L. and Jordan, M. I. (2009). Nonparametric latent feature models for link prediction. In Advances in Neural Information Processing Systems 22 (NIPS’09) 1276–1284.
• Palla, K., Knowles, D. A. and Ghahramani, Z. (2012). An infinite latent attribute model for network data. In Proceedings of the 29th International Conference on Machine Learning (ICML’12) 1607–1614.
• Raftery, A. E., Niu, X., Hoff, P. D. and Yeung, K. Y. (2012). Fast inference for the latent space network model using a case-control approximate likelihood. J. Comput. Graph. Statist. 21 901–919.
• Richardson, M., Agrawal, R. and Domingos, P. M. (2003). Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference (ISWC’03) 351–368.
• Salter-Townshend, M. and Murphy, T. B. (2013). Variational Bayesian inference for the latent position cluster model for network data. Comput. Statist. Data Anal. 57 661–671.
• Sarkar, P. and Moore, A. W. (2010). Fast nearest-neighbor search in disk-resident graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10) 513–522.
• Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. Ann. Statist. 41 508–535.
• Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.
• Vu, D. Q., Hunter, D. R. and Schweinberger, M. (2013). Model-based clustering of large networks. Ann. Appl. Stat. 7 1010–1039.
• Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. J. Amer. Statist. Assoc. 82 8–19.
• Wasserman, S. and Pattison, P. (1996). Logit models and logistic regressions for social networks. I. An introduction to Markov graphs and $p$. Psychometrika 61 401–425.
• Xu, Z., Tresp, V., Yu, K. and Kriegel, H. (2006). Infinite hidden relational models. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06) 544–551.
• Yan, T., Leng, C. and Zhu, J. (2016). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist. 44 31–57.
• Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks using spectral methods. ArXiv e-print. Available at https://arxiv.org/abs/1412.3432.

#### Supplemental materials

• Supplement A: Proofs. We provide the proofs for all propositions and theorems.