Open Access
March 2013 Sparse integrative clustering of multiple omics data sets
Ronglai Shen, Sijian Wang, Qianxing Mo
Ann. Appl. Stat. 7(1): 269-294 (March 2013). DOI: 10.1214/12-AOAS578


High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267–288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301–320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91–108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek “experimental” points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets.


Download Citation

Ronglai Shen. Sijian Wang. Qianxing Mo. "Sparse integrative clustering of multiple omics data sets." Ann. Appl. Stat. 7 (1) 269 - 294, March 2013.


Published: March 2013
First available in Project Euclid: 9 April 2013

zbMATH: 06171272
MathSciNet: MR3086419
Digital Object Identifier: 10.1214/12-AOAS578

Keywords: latent variable approach , penalized regression , Sparse integrative clustering

Rights: Copyright © 2013 Institute of Mathematical Statistics

Vol.7 • No. 1 • March 2013
Back to Top