The Annals of Applied Statistics

Discovering political topics in Facebook discussion threads with graph contextualization

Yilin Zhang, Marie Poux-Berthe, Chris Wells, Karolina Koc-Michalska, and Karl Rohe

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We propose a graph contextualization method, pairGraphText, to study political engagement on Facebook during the 2012 French presidential election. It is a spectral algorithm that contextualizes graph data with text data for online discussion thread. In particular, we examine the Facebook posts of the eight leading candidates and the comments beneath these posts. We find evidence of both (i) candidate-centered structure, where citizens primarily comment on the wall of one candidate and (ii) issue-centered structure (i.e., on political topics), where citizens’ attention and expression is primarily directed toward a specific set of issues (e.g., economics, immigration, etc). To identify issue-centered structure, we develop pairGraphText, to analyze a network with high-dimensional features on the interactions (i.e., text). This technique scales to hundreds of thousands of nodes and thousands of unique words. In the Facebook data, spectral clustering without the contextualizing text information finds a mixture of (i) candidate and (ii) issue clusters. The contextualized information with text data helps to separate these two structures. We conclude by showing that the novel methodology is consistent under a statistical model.

Article information

Ann. Appl. Stat., Volume 12, Number 2 (2018), 1096-1123.

Received: August 2017
Revised: March 2018
First available in Project Euclid: 28 July 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Network Facebook topic spectral clustering node covariate stochastic co-Blockmodel


Zhang, Yilin; Poux-Berthe, Marie; Wells, Chris; Koc-Michalska, Karolina; Rohe, Karl. Discovering political topics in Facebook discussion threads with graph contextualization. Ann. Appl. Stat. 12 (2018), no. 2, 1096--1123. doi:10.1214/18-AOAS1191.

Export citation


  • Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 us election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery 36–43. ACM.
  • Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
  • Bakshy, E., Messing, S. and Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science 348 1130–1132.
  • Binkiewicz, N., Vogelstein, J. T. and Rohe, K. (2017). Covariate-assisted spectral clustering. Biometrika 104 361–377.
  • Blei, D. M. (2012). Probabilistic topic models. Commun. ACM 55 77–84.
  • Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
  • Boyd, D. N. and Ellison, N. B. (2007). Social network sites: Definition, history, and scholarship. J. Comput.-Mediat. Commun. 13 210–230.
  • Chang, J. and Blei, D. (2009). Relational topic models for document networks. In Artificial Intelligence and Statistics 81–88.
  • Chang, J. and Blei, D. M. (2010). Hierarchical relational models for document networks. Ann. Appl. Stat. 4 124–150.
  • Choy, M., Cheong, M. L., Laik, M. N. and Shung, K. P. (2011). A sentiment analysis of Singapore presidential election 2011 using Twitter data with census correction. Preprint. Available at arXiv:1108.5520.
  • Colleoni, E., Rozza, A. and Arvidsson, A. (2014). Echo chamber or public sphere? Predicting political orientation and measuring political homophily in Twitter using big data. J. Commun. 64 317–332.
  • Gonzalez-Bailon, S., Kaltenbrunner, A. and Banchs, R. E. (2010). The structure of political discussion networks: A model for the analysis of online deliberation. J. Inf. Technol. 25 230–243.
  • Grimmer, J. and Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21 267–297.
  • Hebshi, S. and O’Gara (2011). The rohe of online social networking in the 2008 democratic presidential primary campains. Preprint. Available at
  • Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
  • Joachims, T. (1996). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. Technical report, Carnegie-Mellon Univ., Pittsburgh, PA, Dept. of Computer Science.
  • Kaplan, A. M. and Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of social media. Bus. Horiz. 53 59–68.
  • Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E 83 016107.
  • Kim, Y. M. (2009). Issue publics in the new information environment: Selectivity, domain specificity, and extremity. Communic. Res. 36 254–284.
  • Kreiss, D. and McGregor, S. C. (2018). Technology firms shape political communication: The work of Microsoft, Facebook, Twitter, and Google with campaigns during the 2016 US presidential cycle. Polit. Commun. 35 155–177.
  • Kushin, M. J. and Kitchener, K. (2009). Getting political on social network sites: Exploring online political discourse on Facebook. First Monday 14 11-2.
  • Pang, B. and Lee, L. (2008). Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2 1–135.
  • Papacharissi, Z. (2002). The virtual sphere: The Internet as a public sphere. New Media Soc. 4 9–27.
  • Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In Advances in Neural Information Processing Systems 3120–3128.
  • Ramage, H. R., Connolly, L. E. and Cox, J. S. (2009). Comprehensive functional analysis of Mycobacterium tuberculosis toxin-antitoxin systems: Implications for pathogenesis, stress responses, and evolution. PLoS Genet. 5 e1000767.
  • Ramos, J. (2003). Using TF-IDF to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning.
  • Robertson, S. P., Vatrapu, R. K. and Medina, R. (2010). Off the wall political discourse: Facebook use in the 2008 US presidential election. Information Polity 15 11–31.
  • Rohe, K., Qin, T. and Yu, B. (2016). Co-clustering directed graphs to discover asymmetries and directional communities. Proc. Natl. Acad. Sci. USA 113 12679–12684.
  • Salton, G., Wong, A. and Yang, C.-S. (1975). A vector space model for automatic indexing. Commun. ACM 18 613–620.
  • Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In Proceedings Ninth IEEE International Conference on Computer Vision. IEEE.
  • Stieglitz, S. and Dang-Xuan, L. (2012). Political communication and influence through microblogging—an empirical analysis of sentiment in Twitter messages and retweet behavior. In 45th Hawaii International Conference on System Science. IEEE.
  • Stieglitz, S. and Dang-Xuan, L. (2013). Social media and political communication: A social media analytics framework. Soc. Netw. Anal. Min. 3 1277–1291.
  • Tumasjan, A., Sprenger, T. O., Sandner, P. G. and Welpe, I. M. (2011). Election forecasts with Twitter: How 140 characters reflect the political landscape. Soc. Sci. Comput. Rev. 29 402–418.
  • von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.
  • Wang, H., Can, D., Kazemzadeh, A., Bar, F. and Narayanan, S. (2012). A system for real-time Twitter sentiment analysis of 2012 US presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations 115–120. Association for Computational Linguistics.
  • Wattal, S., Schuff, D., Mandviwalla, M. and Williams, C. B. (2010). Web 2.0 and politics: The 2008 US presidential election and an e-politics research agenda. MIS Q. 669–688.
  • Webster, J. G. (2014). The Marketplace of Attention: How Audiences Take Shape in a Digital Age. MIT Press.
  • Wellman, B., Haase, A. Q., Witte, J. and Hampton, K. (2001). Does the Internet increase, decrease, or supplement social capital? Social networks, participation, and community commitment. Am. Behav. Sci. 45 436–455.
  • Williams, C. B. and Gulati, G. J. (2009). Explaining Facebook support in the 2008 congressional election cycle. OpenSIUC Working Papers, 26.
  • Williams, C. B. and Gulati, G. J. (2013). Social networks in political campaigns: Facebook and the congressional elections of 2006 and 2008. New Media Soc. 15 52–71.
  • Witten, D. M. (2011). Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5 2493–2518.
  • Zhang, Y., Poux-Berthe, M., Wells, C., Koc-Michalska, K. and Rohe, K. (2018). Supplement to “Discovering political topics in Facebook discussion threads with graph contextualization.” DOI:10.1214/18-AOAS1191SUPP.

Supplemental materials

  • Supplementary Materials for “Discovering political topics in Facebook discussion threads with graph contextualization”. This supplementary consists of three parts. Part 1 provides more evidence for the candidate-centered structure. Part 2 explains our choice of the number of clusters $K$ when searching for the issue-centered structure. Part 3 discusses different choices for document-term matrices. Part 4 provides more simulations comparing pairGraphText with RTM and other methods including CASC and spectral clustering. Part 5 provides theoretical justifications for pairGraphText.