March 2024 A quantitative linguistic analysis of a cancer online health community with a smooth latent space model
Mengque Liu, Xinyan Fan, Shuangge Ma
Author Affiliations +
Ann. Appl. Stat. 18(1): 144-158 (March 2024). DOI: 10.1214/23-AOAS1783


Online health communities (OHCs) provide free, open, and well-resourced platforms for patients, family members, and others to discuss illnesses, express feelings, and connect with others. Linguistic analysis of OHC posts can assist in better understanding disease conditions as well as monitoring the emotional and mental status of patients and those who are closely related. Many existing OHC linguistic analyses are limited by focusing on individual words. There are a handful of cooccurrence network analyses, which have multiple methodological limitations. In this article we analyze posts that are publicly available at the LUNGevity Foundation’s Lung Cancer Support Community (LCSC). The analyzed data contains 21,028 posts published between April 2018 and February 2022. For word cooccurrence network analysis, we develop a two-part latent space model, which advances from the existing ones by accommodating network weights. Further, we consider the scenario where there are change points in time, networks remain the same between two change points but differ on the two sides of a change point, and the number and locations of change points are unknown. A penalized fusion approach is developed to data-dependently determine change points and estimate networks. In data analysis multiple change points are identified, which reflect significant changes in lung cancer patients’ and their close affiliates’ emotional/mental status and mostly align with the changes in COVID-19. The obtained network structures and other findings are also sensible.

Funding Statement

This study was partly supported by the National Natural Science Foundation of China (NSFC, 12201626), Public Computing Cloud of Renmin University of China, NSF (2209685), and NIH (CA196530).


We thank the Editor and reviewers for their careful review and insightful comments. Fan and Ma are joint corresponding authors.


Download Citation

Mengque Liu. Xinyan Fan. Shuangge Ma. "A quantitative linguistic analysis of a cancer online health community with a smooth latent space model." Ann. Appl. Stat. 18 (1) 144 - 158, March 2024.


Received: 1 October 2022; Revised: 1 April 2023; Published: March 2024
First available in Project Euclid: 31 January 2024

Digital Object Identifier: 10.1214/23-AOAS1783

Keywords: Cancer , cooccurrence network , online health community , quantitative linguistic analysis , smooth latent space model

Rights: Copyright © 2024 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.18 • No. 1 • March 2024
Back to Top