The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 6, Number 3 (2012), 1118-1133.
The importance of distinct modeling strategies for gene and gene-specific treatment effects in hierarchical models for microarray data
When analyzing microarray data, hierarchical models are often used to share information across genes when estimating means and variances or identifying differential expression. Many methods utilize some form of the two-level hierarchical model structure suggested by Kendziorski et al. [Stat. Med. (2003) 22 3899–3914] in which the first level describes the distribution of latent mean expression levels among genes and among differentially expressed treatments within a gene. The second level describes the conditional distribution, given a latent mean, of repeated observations for a single gene and treatment. Many of these models, including those used in Kendziorski’s et al. [Stat. Med. (2003) 22 3899–3914] EBarrays package, assume that expression level changes due to treatment effects have the same distribution as expression level changes from gene to gene. We present empirical evidence that this assumption is often inadequate and propose three-level hierarchical models as extensions to the two-level log-normal based EBarrays models to address this inadequacy. We demonstrate that use of our three-level models dramatically changes analysis results for a variety of microarray data sets and verify the validity and improved performance of our suggested method in a series of simulation studies. We also illustrate the importance of accounting for the uncertainty of gene-specific error variance estimates when using hierarchical models to identify differentially expressed genes.
Ann. Appl. Stat., Volume 6, Number 3 (2012), 1118-1133.
First available in Project Euclid: 31 August 2012
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Lund, Steven P.; Nettleton, Dan. The importance of distinct modeling strategies for gene and gene-specific treatment effects in hierarchical models for microarray data. Ann. Appl. Stat. 6 (2012), no. 3, 1118--1133. doi:10.1214/12-AOAS535. https://projecteuclid.org/euclid.aoas/1346418576
- Supplementary material: Additional evidence supporting need for three-level hierarchy and simulation study details. The correlation across genes present in real microarray data makes directly testing the statistical significance of gene effect variance estimates intractable. We present a simulation study that demonstrates the gene effect variance estimates obtained when analyzing the DC3000 and mouse diet data sets are drastically greater than those that arise when analyzing data simulated without gene effects. We also provide detailed accounts of simulation procedures and results used to evaluate the considered methods. These simulations clearly support our claims regarding the importance of distinctly modeling gene and gene-specific treatment effects and accounting for uncertainty in error variance estimators.