Bernoulli

  • Bernoulli
  • Volume 21, Number 4 (2015), 2308-2335.

Geometric median and robust estimation in Banach spaces

Stanislav Minsker

Full-text: Open access

Abstract

In many real-world applications, collected data are contaminated by noise with heavy-tailed distribution and might contain outliers of large magnitude. In this situation, it is necessary to apply methods which produce reliable outcomes even if the input contains corrupted measurements. We describe a general method which allows one to obtain estimators with tight concentration around the true parameter of interest taking values in a Banach space. Suggested construction relies on the fact that the geometric median of a collection of independent “weakly concentrated” estimators satisfies a much stronger deviation bound than each individual element in the collection. Our approach is illustrated through several examples, including sparse linear regression and low-rank matrix recovery problems.

Article information

Source
Bernoulli, Volume 21, Number 4 (2015), 2308-2335.

Dates
Received: November 2013
Revised: May 2014
First available in Project Euclid: 5 August 2015

Permanent link to this document
https://projecteuclid.org/euclid.bj/1438777595

Digital Object Identifier
doi:10.3150/14-BEJ645

Mathematical Reviews number (MathSciNet)
MR3378468

Zentralblatt MATH identifier
1348.60041

Keywords
distributed computing heavy-tailed noise large deviations linear models low-rank matrix estimation principal component analysis robust estimation

Citation

Minsker, Stanislav. Geometric median and robust estimation in Banach spaces. Bernoulli 21 (2015), no. 4, 2308--2335. doi:10.3150/14-BEJ645. https://projecteuclid.org/euclid.bj/1438777595


Export citation

References

  • [1] Alon, N., Matias, Y. and Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing (Philadelphia, PA, 1996) 20–29. New York: ACM.
  • [2] Audibert, J.-Y. and Catoni, O. (2011). Robust linear least squares regression. Ann. Statist. 39 2766–2794.
  • [3] Bickel, P.J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [4] Bickel, P.J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • [5] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [6] Bose, P., Maheshwari, A. and Morin, P. (2003). Fast approximations for sums of distances, clustering and the Fermat–Weber problem. Comput. Geom. 24 135–146.
  • [7] Bubeck, S., Cesa-Bianchi, N. and Lugosi, G. (2013). Bandits with heavy tail. IEEE Trans. Inform. Theory 59 7711–7717.
  • [8] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • [9] Candès, E.J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37.
  • [10] Candès, E.J. and Plan, Y. (2011). Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory 57 2342–2359.
  • [11] Candès, E.J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
  • [12] Candès, E.J., Romberg, J.K. and Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 1207–1223.
  • [13] Cardot, H., Cénac, P. and Zitt, P.-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli 19 18–43.
  • [14] Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 48 1148–1185.
  • [15] Chandrasekaran, R. and Tamir, A. (1990). Algebraic optimization: The Fermat–Weber location problem. Math. Program. 46 219–224.
  • [16] Davis, C. and Kahan, W.M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46.
  • [17] Haldane, J.B.S. (1948). Note on the median of a multivariate distribution. Biometrika 35 414–417.
  • [18] Hsu, D. and Sabato, S. (2013). Loss minimization and parameter estimation with heavy tails. Preprint. Available at arXiv:1307.1827.
  • [19] Huber, P.J. and Ronchetti, E.M. (2009). Robust Statistics, 2nd ed. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley.
  • [20] Hubert, M., Rousseeuw, P.J. and Van Aelst, S. (2008). High-breakdown robust multivariate methods. Statist. Sci. 23 92–119.
  • [21] Ioffe, A.D. and Tikhomirov, V.M. (1974). Theory of Extremal Problems. Moscow: Nauka.
  • [22] Kemperman, J.H.B. (1987). The median of a finite measure on a Banach space. In Statistical Data Analysis Based on the $L_{1}$-Norm and Related Methods (Neuchâtel, 1987) 217–230. Amsterdam: North-Holland.
  • [23] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Lectures from the 38th Probability Summer School held in Saint-Flour 2008. École d’Été de Probabilités de Saint-Flour. Heidelberg: Springer.
  • [24] Koltchinskii, V., Lounici, K. and Tsybakov, A.B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
  • [25] Kuhn, H.W. (1973). A note on Fermat’s problem. Math. Program. 4 98–107.
  • [26] Lambert-Lacroix, S. and Zwald, L. (2011). Robust regression through the Huber’s criterion and adaptive lasso penalty. Electron. J. Stat. 5 1015–1053.
  • [27] Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40 1024–1060.
  • [28] Lerasle, M. and Oliveira, R.I. (2011). Robust empirical mean estimators. Preprint. Available at arXiv:1112.3914.
  • [29] Lounici, K. (2014). High-dimensional covariance matrix estimation with missing observations. Bernoulli 20 1029–1058.
  • [30] Minsker, S. (2013). Geometric median and robust estimation in Banach spaces. Preprint. Available at http://sminsker.wordpress.com/publications/.
  • [31] Negahban, S. and Wainwright, M.J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069–1097.
  • [32] Negahban, S. and Wainwright, M.J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665–1697.
  • [33] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Berlin: Springer.
  • [34] Nemirovski, A. and Yudin, D. (1983). Problem Complexity and Method Efficiency in Optimization. New York: Wiley.
  • [35] Nguyen, N.H. and Tran, T.D. (2013). Robust Lasso with missing and grossly corrupted observations. IEEE Trans. Inform. Theory 59 2036–2058.
  • [36] Ostresh, L.M. Jr. (1978). On the convergence of a class of iterative methods for solving the Weber location problem. Oper. Res. 26 597–609.
  • [37] Overton, M.L. (1983). A quadratically convergent method for minimizing a sum of Euclidean norms. Math. Program. 27 34–63.
  • [38] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dubl. Phil. Mag. J. Sci. 2 559–572.
  • [39] Recht, B., Fazel, M. and Parrilo, P.A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
  • [40] Rohde, A. and Tsybakov, A.B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
  • [41] Small, C. (1990). A survey of multidimensional medians. Internat. Statist. Rev. 58 263–277.
  • [42] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • [43] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics. New York: Springer.
  • [44] Vardi, Y. and Zhang, C.-H. (2000). The multivariate $L_{1}$-median and associated data depth. Proc. Natl. Acad. Sci. USA 97 1423–1426 (electronic).
  • [45] Weber, A. (1929). Uber Den Standort der Industrien (Alfred Weber’s Theory of the Location of Industries). Chicago, IL: Univ. Chicago Press.
  • [46] Weiszfeld, E. (1937). Sur un problème de minimum dans l’espace. Tohoku Math. J. (2) 42 274–280.
  • [47] Wright, J. and Ma, Y. (2010). Dense error correction via $\ell_{1}$-minimization. IEEE Trans. Inform. Theory 56 3540–3560.
  • [48] Zhang, T. and Lerman, G. (2014). A novel ${M}$-estimator for robust PCA. J. Mach. Learn. Res. 15 749–808.
  • [49] Zwald, L. and Blanchard, G. (2006). On the convergence of eigenspaces in kernel principal component analysis. In Advances in Neural Information Processing Systems 18 1649–1656. Cambridge, MA: MIT Press.