Open Access
2020 ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels
Geoffrey Chinot
Electron. J. Statist. 14(2): 3563-3605 (2020). DOI: 10.1214/20-EJS1754

Abstract

We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk Minimizers (RERM) for regression problems with convex and $L$-Lipschitz loss functions. We consider a setting where $|{\mathcal{O}}|$ malicious outliers contaminate the labels. In that case, under a local Bernstein condition, we show that the $L_{2}$-error rate is bounded by $r_{N}+AL|{\mathcal{O}}|/N$, where $N$ is the total number of observations, $r_{N}$ is the $L_{2}$-error rate in the non-contaminated setting and $A$ is a parameter coming from the local Bernstein condition. When $r_{N}$ is minimax-rate-optimal in a non-contaminated setting, the rate $r_{N}+AL|{\mathcal{O}}|/N$ is also minimax-rate-optimal when $|{\mathcal{O}}|$ outliers contaminate the label. The main results of the paper can be used for many non-regularized and regularized procedures under weak assumptions on the noise. We present results for Huber’s M-estimators (without penalization or regularized by the $\ell _{1}$-norm) and for general regularized learning problems in reproducible kernel Hilbert spaces when the noise can be heavy-tailed.

Citation

Download Citation

Geoffrey Chinot. "ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels." Electron. J. Statist. 14 (2) 3563 - 3605, 2020. https://doi.org/10.1214/20-EJS1754

Information

Received: 1 December 2019; Published: 2020
First available in Project Euclid: 1 October 2020

zbMATH: 07270271
MathSciNet: MR4155965
Digital Object Identifier: 10.1214/20-EJS1754

Subjects:
Primary: 62G35
Secondary: 62G08

Keywords: minimax-rate-optimality , Outliers , regularized empirical risk minimizers , robustness

Vol.14 • No. 2 • 2020
Back to Top