Open Access
August 2012 A geometric analysis of subspace clustering with outliers
Mahdi Soltanolkotabi, Emmanuel J. Candés
Ann. Statist. 40(4): 2195-2238 (August 2012). DOI: 10.1214/12-AOS1034


This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (2009) 2790–2797. IEEE], which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretical analysis and demonstrates the effectiveness of these methods.


Download Citation

Mahdi Soltanolkotabi. Emmanuel J. Candés. "A geometric analysis of subspace clustering with outliers." Ann. Statist. 40 (4) 2195 - 2238, August 2012.


Published: August 2012
First available in Project Euclid: 23 January 2013

zbMATH: 1318.62217
MathSciNet: MR3059081
Digital Object Identifier: 10.1214/12-AOS1034

Primary: 62-07

Keywords: $\ell_{1}$ minimization , concentration of measure , duality in linear programming , geometric functional analysis , outlier detection , properties of convex bodies , spectral clustering , Subspace clustering

Rights: Copyright © 2012 Institute of Mathematical Statistics

Vol.40 • No. 4 • August 2012
Back to Top