Annals of Statistics

Minimax estimation of linear and quadratic functionals on sparsity classes

Abstract

For the Gaussian sequence model, we obtain nonasymptotic minimax rates of estimation of the linear, quadratic and the $\ell_{2}$-norm functionals on classes of sparse vectors and construct optimal estimators that attain these rates. The main object of interest is the class $B_{0}(s)$ of $s$-sparse vectors $\theta=(\theta_{1},\dots,\theta_{d})$, for which we also provide completely adaptive estimators (independent of $s$ and of the noise variance $\sigma$) having logarithmically slower rates than the minimax ones. Furthermore, we obtain the minimax rates on the $\ell_{q}$-balls $B_{q}(r)=\{\theta\in\mathbb{R}^{d}:\|\theta\|_{q}\le r\}$ where $0<q\le2$, and $\|\theta\|_{q}=(\sum_{i=1}^{d}|\theta_{i}|^{q})^{1/q}$. This analysis shows that there are, in general, three zones in the rates of convergence that we call the sparse zone, the dense zone and the degenerate zone, while a fourth zone appears for estimation of the quadratic functional. We show that, as opposed to estimation of $\theta$, the correct logarithmic terms in the optimal rates for the sparse zone scale as $\log(d/s^{2})$ and not as $\log(d/s)$. For the class $B_{0}(s)$, the rates of estimation of the linear functional and of the $\ell_{2}$-norm have a simple elbow at $s=\sqrt{d}$ (boundary between the sparse and the dense zones) and exhibit similar performances, whereas the estimation of the quadratic functional $Q(\theta)$ reveals more complex effects: the minimax risk on $B_{0}(s)$ is infinite and the sparseness assumption needs to be combined with a bound on the $\ell_{2}$-norm. Finally, we apply our results on estimation of the $\ell_{2}$-norm to the problem of testing against sparse alternatives. In particular, we obtain a nonasymptotic analog of the Ingster–Donoho–Jin theory revealing some effects that were not captured by the previous asymptotic analysis.

Article information

Source
Ann. Statist., Volume 45, Number 3 (2017), 923-958.

Dates
Revised: October 2015
First available in Project Euclid: 13 June 2017

https://projecteuclid.org/euclid.aos/1497319684

Digital Object Identifier
doi:10.1214/15-AOS1432

Mathematical Reviews number (MathSciNet)
MR3662444

Zentralblatt MATH identifier
1368.62191

Subjects
Primary: 62J05: Linear regression 62G05: Estimation

Citation

Collier, Olivier; Comminges, Laëtitia; Tsybakov, Alexandre B. Minimax estimation of linear and quadratic functionals on sparsity classes. Ann. Statist. 45 (2017), no. 3, 923--958. doi:10.1214/15-AOS1432. https://projecteuclid.org/euclid.aos/1497319684

References

• [1] Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression. Electron. J. Stat. 4 932–949.
• [2] Aldous, D. J. (1985). Exchangeability and related topics. In École D’été de Probabilités de Saint-Flour, XIII—1983. Lecture Notes in Math. 1117 1–198. Springer, Berlin.
• [3] Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
• [4] Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
• [5] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203–268.
• [6] Birnbaum, Z. W. (1942). An inequality for Mill’s ratio. Ann. Math. Stat. 13 245–246.
• [7] Butucea, C. (2007). Goodness-of-fit testing and quadratic functional estimation from indirect observations. Ann. Statist. 35 1907–1930.
• [8] Butucea, C. and Comte, F. (2009). Adaptive estimation of linear functionals in the convolution model and applications. Bernoulli 15 69–98.
• [9] Cai, T. T. and Low, M. G. (2004). Minimax estimation of linear functionals over nonconvex parameter spaces. Ann. Statist. 32 552–576.
• [10] Cai, T. T. and Low, M. G. (2005). On adaptive estimation of linear functionals. Ann. Statist. 33 2311–2343.
• [11] Cai, T. T. and Low, M. G. (2005). Nonquadratic estimators of a quadratic functional. Ann. Statist. 33 2930–2956.
• [12] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• [13] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over $\ell_{p}$-balls for $\ell_{q}$-error. Probab. Theory Related Fields 99 277–303.
• [14] Donoho, D. L. and Liu, R. (1991). Geometrizing rates of convergence. III. Ann. Statist. 19 668–701.
• [15] Donoho, D. L. and Nussbaum, M. (1990). Minimax quadratic estimation of a quadratic functional. J. Complexity 6 290–323.
• [16] Efromovich, S. and Low, M. L. (1996). On optimal adaptive estimation of a quadratic functional. Ann. Statist. 24 1106–1125.
• [17] Goldenshluger, A. and Pereverzev, S. V. (2003). On adaptive inverse estimation of linear functionals. Bernoulli 9 783–807.
• [18] Golubev, G. K. (2004). The method of risk envelopes in the estimation of linear functionals. Problemy Peredachi Informatsii 40 58–72.
• [19] Golubev, Y. and Levit, B. (2004). An oracle approach to adaptive estimation of linear functionals in a Gaussian model. Math. Methods Statist. 13 392–408 (2005).
• [20] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
• [21] Ibragimov, I. A. and Hasminskii, R. Z. (1984). Nonparametric estimation of the value of a linear functional in Gaussian white noise. Theory Probab. Appl. 29 19–32.
• [22] Ingster, Yu. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
• [23] Ingster, Y. I., Pouet, C. and Tsybakov, A. B. (2009). Classification of sparse high-dimensional vectors. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4427–4448.
• [24] Ingster, Y. I. and Suslina, I. A. (2003). Nonparametric Goodness-of-Fit Testing Under Gaussian Models. Lecture Notes in Statist. 169. Springer, New York.
• [25] Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
• [26] Johnstone, I. (2001). Thresholding for weighted $\chi^{2}$. Statist. Sinica 11 691–704.
• [27] Johnstone, I. M. (2001). Chi-square oracle inequalities. IMS Lecture Notes Monogr. Ser. 36 399–418.
• [28] Johnstone, I. M. (2013). Gaussian Estimation: Sequence and Wavelet Models. Book draft.
• [29] Juditsky, A. and Nemirovski, A. (2009). Nonparametric estimation by convex programming. Ann. Statist. 37 2278–2300.
• [30] Klemelä, J. (2006). Sharp adaptive estimation of quadratic functionals. Probab. Theory Related Fields 134 539–564.
• [31] Klemelä, J. and Tsybakov, A. B. (2001). Sharp adaptive estimation of linear functionals. Ann. Statist. 29 1567–1600.
• [32] Laurent, B., Ludeña, C. and Prieur, C. (2008). Adaptive estimation of linear functionals by model selection. Electron. J. Stat. 2 993–1020.
• [33] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
• [34] Lepski, O., Nemirovski, A. and Spokoiny, V. (1999). On estimation of the $L_{r}$ norm of a regression function. Probab. Theory Related Fields 113 221–253.
• [35] Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38–53.
• [36] Nemirovski, A. (2000). Topics in Nonparametric Statistics. Ecole d’été de Probabilités de Saint Flour 1998. Lecture Notes in Math. 1738. Springer, New York.
• [37] Rigollet, P. and Tsybakov, A. B. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
• [38] Sampford, M. R. (1953). Some inequalities on Mills ratio and related functions. Ann. Math. Stat. 24 132–134.
• [39] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
• [40] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38–90.