The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 13, Number 2 (2019), 1016-1042.
Sparse principal component analysis with missing observations
Principal component analysis (PCA) is a commonly used statistical method in a wide range of applications. However, it does not work well when the number of features is larger than the sample size. We consider the estimation of the sparse principal subspace in the high dimensional setting with missing data motivated by the analysis of single-cell RNA sequence data. We propose a two step estimation procedure, and establish the rates of convergence for estimating the principal subspace. Simulated examples with various missing mechanisms show its competitive performance compared to existing sparse PCA methods. We apply the method to single-cell data and show that the proposed method can better distinguish cell types than other PCA methods.
Ann. Appl. Stat., Volume 13, Number 2 (2019), 1016-1042.
Received: June 2017
Revised: September 2018
First available in Project Euclid: 17 June 2019
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Park, Seyoung; Zhao, Hongyu. Sparse principal component analysis with missing observations. Ann. Appl. Stat. 13 (2019), no. 2, 1016--1042. doi:10.1214/18-AOAS1220. https://projecteuclid.org/euclid.aoas/1560758436
- Supplement to “Sparse principal component analysis with missing observations”. We provide proofs of the theoretical results presented in the main paper, characteristics of the used scRNA-seq data sets, performance metrics, and additional tables and figures for simulation and single cell data analysis.