## The Annals of Statistics

### Sub-Gaussian estimators of the mean of a random vector

#### Abstract

We study the problem of estimating the mean of a random vector $X$ given a sample of $N$ independent, identically distributed points. We introduce a new estimator that achieves a purely sub-Gaussian performance under the only condition that the second moment of $X$ exists. The estimator is based on a novel concept of a multivariate median.

#### Article information

Source
Ann. Statist., Volume 47, Number 2 (2019), 783-794.

Dates
Revised: July 2017
First available in Project Euclid: 11 January 2019

https://projecteuclid.org/euclid.aos/1547197238

Digital Object Identifier
doi:10.1214/17-AOS1639

Mathematical Reviews number (MathSciNet)
MR3909950

Zentralblatt MATH identifier
07033151

#### Citation

Lugosi, Gábor; Mendelson, Shahar. Sub-Gaussian estimators of the mean of a random vector. Ann. Statist. 47 (2019), no. 2, 783--794. doi:10.1214/17-AOS1639. https://projecteuclid.org/euclid.aos/1547197238

#### References

• [1] Alon, N., Matias, Y. and Szegedy, M. (1999). The space complexity of approximating the frequency moments. J. Comput. System Sci. 58 137–147.
• [2] Aloupis, G. (2006). Geometric measures of data depth. In Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 72 147–158. Amer. Math. Soc., Providence, RI.
• [3] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford Univ. Press, Oxford.
• [4] Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 48 1148–1185.
• [5] Cohen, M. B., Lee, Y. T., Miller, G., Pachocki, J. and Sidford, A. (2016). Geometric median in nearly linear time. In STOC’16—Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing 9–21. ACM, New York.
• [6] Devroye, L., Lerasle, M., Lugosi, G. and Oliveira, R. I. (2016). Sub-Gausssian mean estimators. Ann. Statist. 44 2695–2725.
• [7] Hanson, D. L. and Wright, F. T. (1971). A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Stat. 42 1079–1083.
• [8] Hsu, D. (2010). Robust statistics. Available at http://www.inherentuncertainty.org/2010/12/robust-statistics.html.
• [9] Hsu, D. and Sabato, S. (2016). Loss minimization and parameter estimation with heavy tails. J. Mach. Learn. Res. 17 Paper No. 18.
• [10] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley, CA.
• [11] Jerrum, M. R., Valiant, L. G. and Vazirani, V. V. (1986). Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci. 43 169–188.
• [12] Joly, E., Lugosi, G. and Oliveira, R. I. (2016). On the estimation of the mean of a random vector. Preprint.
• [13] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Space. Springer, New York.
• [14] Lerasle, M. and Oliveira, R. I. (2012). Robust empirical mean estimators. Available at arXiv:1112.3914.
• [15] Lugosi, G. and Mendelson, S. (2016). Risk minimization by median-of-means tournaments. Preprint.
• [16] Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli 21 2308–2335.
• [17] Nemirovsky, A. S. and Yudin, D. B. (1983). Problem complexity and method efficiency in optimization.
• [18] van der Waart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
• [19] Vershynin, R. (2009). Lectures in geometric functional analysis. Available at https://www.math.uci.edu/~rvershyn/papers/GFA-book.pdf.