March 2024 RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data
Xinlei Mi, William Bekerman, Anil K. Rustgi, Peter A. Sims, Peter D. Canoll, Jianhua Hu
Author Affiliations +
Ann. Appl. Stat. 18(1): 1-22 (March 2024). DOI: 10.1214/23-AOAS1761

Abstract

Applications of single-cell RNA sequencing in various biomedical research areas have been blooming. This new technology provides unprecedented opportunities to study disease heterogeneity at the cellular level. However, unique characteristics of scRNA-seq data, including large dimensionality, high dropout rates, and possibly batch effects, bring great difficulty into the analysis of such data. Not appropriately addressing these issues obstructs true scientific discovery. Herein we propose a unified Regularized Zero-inflated Mixture Model framework, designed for scRNA-seq data (RZiMM-scRNA), to simultaneously detect cell subgroups and identify gene differential expression based on a developed importance score, accounting for both dropouts and batch effects. We conduct extensive simulation studies in which we evaluate the performance of RZiMM-scRNA and compare it with several popular methods, including Seurat, SC3, K-means, and hierarchical clustering. Simulation results show that RZiMM-scRNA demonstrates superior clustering performance and enhanced biomarker detection accuracy, compared to alternative methods, especially when cell subgroups are less distinct, verifying the robustness of our method.

Our empirical investigations focus on two brain tumor studies dealing with astrocytoma of various grades, including the most malignant of all brain tumors, glioblastoma multiforme (GBM). Our goal is to delineate cell heterogeneity and identify driving biomarkers associated with these tumors. Notably, RZiMM-scNRA successfully identifies a small group of oligodendrocyte cells, which has drawn much attention in biomedical literature on brain cancers. In addition, our method discovers several new biomarkers which are not discussed in the original studies, including PLP1, BCAN, and PTPRZ1—all associated with the development and malignant growth of glioma—as well as CAMK2B, which is downregulated in glioma and GBM and implicated in neurodevelopment, brain function, learning and memory processes.

Funding Statement

The work was partially supported by the National Institute of Health Grants NCI 5P30 CA013696, NCI P01 CA098101, NIAID 1R01 AI143886, and NCI 1R01 CA219896.

Citation

Download Citation

Xinlei Mi. William Bekerman. Anil K. Rustgi. Peter A. Sims. Peter D. Canoll. Jianhua Hu. "RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data." Ann. Appl. Stat. 18 (1) 1 - 22, March 2024. https://doi.org/10.1214/23-AOAS1761

Information

Received: 1 October 2021; Revised: 1 February 2023; Published: March 2024
First available in Project Euclid: 31 January 2024

MathSciNet: MR4698595
Digital Object Identifier: 10.1214/23-AOAS1761

Keywords: batch effect , clustering , Dropout , mixture model , Single cell

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 1 • March 2024
Back to Top