Electronic Journal of Statistics

Density estimation with contamination: minimax rates and theory of adaptation

Haoyang Liu and Chao Gao

Full-text: Open access

Abstract

This paper studies density estimation under pointwise loss in the setting of contamination model. The goal is to estimate $f(x_{0})$ at some $x_{0}\in\mathbb{R}$ with i.i.d. contaminated observations: \[X_{1},\dots,X_{n}\sim (1-\epsilon)f+\epsilon g\] where $g$ stands for a contamination distribution. We closely track the effect of contamination by the following model index: contamination proportion $\epsilon$, smoothness of the target density $\beta_{0}$, smoothness of the contamination density $\beta_{1}$, and the local level of contamination $m$ such that $g(x_{0})\leq{m}$. The local effect of contamination is shown to depend intricately on the interplay of these parameters. In particular, under a minimax framework, the cost \[[\epsilon^{2}(1\wedge m)^{2}]\vee[n^{-\frac{2\beta_{1}}{2\beta_{1}+1}}\epsilon^{\frac{2}{2\beta_{1}+1}}]\] is shown to be the optimal cost for contamination compared with the usual minimax rate without contamination. The lower bound relies on a novel construction that involves perturbations of a density function at two different resolutions. Such a construction may be of independent interest for the study of local effect of contamination in other nonparametric estimation problems. We also study the setting without any assumption on the contamination distribution, and the minimax cost for contamination is shown to be \[\epsilon^{\frac{2\beta_{0}}{\beta_{0}+1}}.\] Finally, the minimax cost for adaptation is established both for smooth contamination and arbitrary contamination. Under arbitrary contamination, we show that while adaptation to either contamination proportion or smoothness only costs a logarithmic factor, adaptation to both numbers is impossible.

Article information

Source
Electron. J. Statist., Volume 13, Number 2 (2019), 3613-3653.

Dates
Received: December 2018
First available in Project Euclid: 1 October 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1569895284

Digital Object Identifier
doi:10.1214/19-EJS1617

Keywords
Minimax rate nonparametric functional estimation adaptive estimation contamination model robust statistics Lepski’s method

Rights
Creative Commons Attribution 4.0 International License.

Citation

Liu, Haoyang; Gao, Chao. Density estimation with contamination: minimax rates and theory of adaptation. Electron. J. Statist. 13 (2019), no. 2, 3613--3653. doi:10.1214/19-EJS1617. https://projecteuclid.org/euclid.ejs/1569895284


Export citation

References

  • [1] Lawrence D. Brown and Mark G. Low. A constrained risk inequality with applications to nonparametric functional estimation., The Annals of Statistics, 24(6) :2524–2535, 1996.
  • [2] T. Tony Cai. Rates of convergence and adaptation over Besov spaces under pointwise risk., Statistica Sinica, 13:881–902, 2003.
  • [3] T. Tony Cai and Jiashun Jin. Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing., The Annals of Statistics, 38(1):100–145, 2010.
  • [4] T. Tony Cai and Mark G. Low. On adaptive estimation of linear functionals., The Annals of Statistics, 33(5) :2311–2343, 2005.
  • [5] T. Tony Cai and Mark G. Low. Optimal adaptive estimation of a quadratic functional., The Annals of Statistics, 34(5) :2298–2325, 2006.
  • [6] Mengjie Chen, Chao Gao, and Zhao Ren. A general decision theory for Huber’s $\epsilon $-contamination model., Electronic Journal of Statistics, 10(2) :3752–3774, 2016.
  • [7] Mengjie Chen, Chao Gao, Zhao Ren, et al. Robust covariance and scatter matrix estimation under Huber’s contamination model., The Annals of Statistics, 46(5) :1932–1960, 2018.
  • [8] L. Devroye and G. Lugosi. Combinatorial methods in density estimation, 2001.
  • [9] David L. Donoho. Statistical estimation and optimal recovery., The Annals of Statistics, 22(1):238–270, 1994.
  • [10] David L. Donoho and Richard C. Liu. Geometrizing rates of convergence, iii., The Annals of Statistics, 19(2):668–701, 1991.
  • [11] Bradley Efron. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis., Journal of the American Statistical Association, 99(465):96–104, 2004.
  • [12] Chao Gao. Robust regression via mutivariate regression depth., Bernoulli (to appear), 2017.
  • [13] Christian H. Hesse. Deconvolving a density from partially contaminated observations., Journal of Multivariate Analysis, 55(2):246–260, 1995.
  • [14] Peter J. Huber. Robust estimation of a location parameter., The Annals of Mathematical Statistics, 35(1):73–101, 1964.
  • [15] Peter J. Huber. A robust version of the probability ratio test., The Annals of Mathematical Statistics, 36(6) :1753–1758, 1965.
  • [16] Jiashun Jin and T. Tony Cai. Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons., Journal of the American Statistical Association, 102(478):495–506, 2007.
  • [17] Iain M. Johnstone. Chi-square oracle inequalities., Lecture Notes-Monograph Series, pages 399–418, 2001.
  • [18] O.V. Lepski and V.G. Spokoiny. Optimal pointwise adaptive methods in nonparametric estimation., The Annals of Statistics, 25(6) :2512–2546, 1997.
  • [19] O.V. Lepski and T. Willer. Oracle inequalities and adaptive estimation in the convolution structure density model., The Annals of Statistics, 47(1):233–287, 2019.
  • [20] O.V. Lepskii. On a problem of adaptive estimation in gaussian white noise., Theory of Probability & Its Applications, 35(3):454–466, 1991.
  • [21] O.V. Lepskii. Asymptotically minimax adaptive estimation. I: Upper bounds. optimally adaptive estimates., Theory of Probability & Its Applications, 36(4):682–697, 1992.
  • [22] O.V. Lepskii. Asymptotically minimax adaptive estimation. II. schemes without optimal adaptation: Adaptive estimators., Theory of Probability & Its Applications, 37(3):433–448, 1993.
  • [23] Rostyslav Maiboroda and Olena Sugakova. Nonparametric density estimation for symmetric distributions by contaminated data., Metrika, 75(1):109–126, 2012.
  • [24] Bernard W. Silverman., Density estimation for statistics and data analysis, volume 26. CRC Press, 1986.
  • [25] Karine Tribouley. Adaptive estimation of integrated functionals., Mathematical Methods of Statistics, 9(1):19–38, 2000.
  • [26] Alexandre B. Tsybakov., Introduction to nonparametric estimation, volume 11. Springer, 2009.
  • [27] Bin Yu. Assouad, fano, and le cam., Festschrift for Lucien Le Cam, 423:435, 1997.
  • [28] Ming Yuan and Jiaqin Chen. Deconvolving multidimensional density from partially contaminated observations., Journal of Statistical Planning and Inference, 104(1):147–160, 2002.