June 2024 Information-incorporated clustering analysis of disease prevalence trends
Chenjin Ma, Cunjie Lin, Yuan Xue, Sanguo Zhang, Qingzhao Zhang, Shuangge Ma
Author Affiliations +
Ann. Appl. Stat. 18(2): 1035-1050 (June 2024). DOI: 10.1214/23-AOAS1821


In biomedical research the analysis of disease prevalence is of critical importance. While most of the existing prevalence studies focus on individual diseases, there has been increasing effort that jointly examines the prevalence values and their trends of multiple diseases. Such joint analysis can provide valuable insights not shared by individual-disease analysis. A critical limitation of the existing analysis is that there is a lack of attention to existing information, which has been accumulated through a large number of studies and can be valuable especially when there are a large number of diseases but the number of prevalence values for a specific disease is limited. In this study we conduct the functional clustering analysis of prevalence trends for a large number of diseases. A novel approach based on the penalized fusion technique is developed to incorporate information mined from published articles. It is innovatively designed to take into account that such information may not be fully relevant or correct. Another significant development is that statistical properties are rigorously established. Simulation is conducted and demonstrates its competitive performance. In the analysis of data from Taiwan NHIRD (National Health Insurance Research Database), new and interesting findings that differ from the existing ones are made.

Funding Statement

This study was supported by China Postdoctoral Science Foundation (2022M720328), CSIAM Research Project for Young Women in Applied Mathematics, Beijing Postdoctoral Research Foundation, National Natural Science Foundation of China (11971404, 11701561), 111 Project (B13028), National Statistical Science Research Project (2019LZ22), Fund for building world-class universities (disciplines) of Renmin University of China, NSF (1916251, 2209685), and a Yale Cancer Center Pilot Award.


The authors thank the Editor and reviewers for their careful review and insightful comments, which have led to a major improvement of this article.


Download Citation

Chenjin Ma. Cunjie Lin. Yuan Xue. Sanguo Zhang. Qingzhao Zhang. Shuangge Ma. "Information-incorporated clustering analysis of disease prevalence trends." Ann. Appl. Stat. 18 (2) 1035 - 1050, June 2024. https://doi.org/10.1214/23-AOAS1821


Received: 1 September 2022; Revised: 1 June 2023; Published: June 2024
First available in Project Euclid: 5 April 2024

Digital Object Identifier: 10.1214/23-AOAS1821

Keywords: clustering , disease prevalence trends , information incorporation , penalized fusion

Rights: Copyright © 2024 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.18 • No. 2 • June 2024
Back to Top