Open Access
June 2024 Accurate estimation of rare cell-type fractions from tissue omics data via hierarchical deconvolution
Penghui Huang, Manqi Cai, Xinghua Lu, Chris McKennan, Jiebiao Wang
Author Affiliations +
Ann. Appl. Stat. 18(2): 1178-1194 (June 2024). DOI: 10.1214/23-AOAS1829


Bulk transcriptomics in tissue samples reflects the average expression levels across different cell types and is highly influenced by cellular fractions. As such, it is critical to estimate cellular fractions to both deconfound differential expression analyses and infer cell type-specific differential expression. Since experimentally counting cells is infeasible in most tissues and studies, in silico cellular deconvolution methods have been developed as an alternative. However, existing methods are designed for tissues consisting of clearly distinguishable cell types and have difficulties estimating highly correlated or rare cell types. To address this challenge, we propose hierarchical deconvolution (HiDecon) that uses single-cell RNA sequencing references and a hierarchical cell-type tree, which models the similarities among cell types and cell differentiation relationships, to estimate cellular fractions in bulk data. By coordinating cell fractions across layers of the hierarchical tree, cellular fraction information is passed up and down the tree, which helps correct estimation biases by pooling information across related cell types. The flexible hierarchical tree structure also enables estimating rare cell fractions by splitting the tree to higher resolutions. Through simulations and real data applications with the ground truth of measured cellular fractions, we demonstrate that HiDecon outperforms existing methods and accurately estimates cellular fractions. Finally, we show the utility of HiDecon estimates in identifying the associations between cellular fractions and Alzheimer’s disease.


This research was funded in part through NIH’s R01AG080590, R03OD034501, and R01MH123184, and grants from the University of Pittsburgh Brain Institute and Competitive Medical Research Fund of the UPMC Health System. This research was supported in part by the University of Pittsburgh Center for Research Computing through the resources provided. The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (Contract No. N01-HC-25195, HHSN268201500001I and 75N92019D00031). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI. The results published here are, in part, based on data obtained from the AD Knowledge Portal. Study data were provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161 (ROS), R01AG15819 (ROSMAP; genomics and RNAseq), R01AG17917 (MAP), R01AG36836 (RNAseq), U01AG46152 (ROSMAP AMP-AD, targeted proteomics), U01AG61356 (whole genome sequencing, targeted proteomics, ROSMAP AMP-AD), and the Illinois Department of Public Health (ROSMAP). Additional phenotypic data can be requested at The authors would like to express their gratitude for the constructive suggestions from the Editor, Associate Editor, and the two referees.


Download Citation

Penghui Huang. Manqi Cai. Xinghua Lu. Chris McKennan. Jiebiao Wang. "Accurate estimation of rare cell-type fractions from tissue omics data via hierarchical deconvolution." Ann. Appl. Stat. 18 (2) 1178 - 1194, June 2024.


Received: 1 March 2023; Revised: 1 September 2023; Published: June 2024
First available in Project Euclid: 5 April 2024

Digital Object Identifier: 10.1214/23-AOAS1829

Keywords: Cellular deconvolution , hierarchical tree , penalized regression , RNA sequencing , single-cell data

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 2 • June 2024
Back to Top