## The Annals of Statistics

### Variable selection with Hamming loss

#### Abstract

We derive nonasymptotic bounds for the minimax risk of variable selection under expected Hamming loss in the Gaussian mean model in $\mathbb{R}^{d}$ for classes of at most $s$-sparse vectors separated from 0 by a constant $a>0$. In some cases, we get exact expressions for the nonasymptotic minimax risk as a function of $d,s,a$ and find explicitly the minimax selectors. These results are extended to dependent or non-Gaussian observations and to the problem of crowdsourcing. Analogous conclusions are obtained for the probability of wrong recovery of the sparsity pattern. As corollaries, we derive necessary and sufficient conditions for such asymptotic properties as almost full recovery and exact recovery. Moreover, we propose data-driven selectors that provide almost full and exact recovery adaptively to the parameters of the classes.

#### Article information

Source
Ann. Statist., Volume 46, Number 5 (2018), 1837-1875.

Dates
Revised: March 2017
First available in Project Euclid: 17 August 2018

https://projecteuclid.org/euclid.aos/1534492821

Digital Object Identifier
doi:10.1214/17-AOS1572

Mathematical Reviews number (MathSciNet)
MR3845003

Zentralblatt MATH identifier
06964318

#### Citation

Butucea, Cristina; Ndaoud, Mohamed; Stepanova, Natalia A.; Tsybakov, Alexandre B. Variable selection with Hamming loss. Ann. Statist. 46 (2018), no. 5, 1837--1875. doi:10.1214/17-AOS1572. https://projecteuclid.org/euclid.aos/1534492821

#### References

• [1] Abramovich, F. and Benjamini, Y. (1995). Thresholding of wavelet coefficients as multiple hypotheses testing procedure. In Wavelets and Statistics, Lecture Notes in Statistics 103 5–14. Springer, New York.
• [2] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
• [3] Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series 55. For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C.
• [4] Arias-Castro, E. and Chen, S. (2017). Distribution-free multiple testing. Electron. J. Stat. 11 1983–2001.
• [5] Bertin, K. and Lecué, G. (2008). Selection of variables and dimension reduction in high-dimensional non-parametric regression. Electron. J. Stat. 2 1224–1241.
• [6] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140.
• [7] Butucea, C., Ingster, Y. I. and Suslina, I. A. (2015). Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM Probab. Stat. 19 115–134.
• [8] Butucea, C., Ndaoud, M., Stepanova, N. A. and Tsybakov, A. B. (2018). Supplement to “Variable selection with Hamming loss.” DOI:10.1214/17-AOS1572SUPP.
• [9] Butucea, C. and Stepanova, N. (2017). Adaptive variable selection in nonparametric sparse additive models. Electron. J. Stat. 11 2321–2357.
• [10] Collier, O., Comminges, L., Tsybakov, A. B. and Verzelen, N. (2016). Optimal adaptive estimation of linear functionals under sparsity. http://arxiv.org/abs/1611.09744.
• [11] Comminges, L. and Dalalyan, A. S. (2012). Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Statist. 40 2667–2696.
• [12] Gao, C., Lu, Y. and Zhou, D. (2016). Exact exponent in optimal rates for crowdsourcing. http://arxiv.org/abs/1605.07696.
• [13] Genovese, C. R., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107–2143.
• [14] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
• [15] Ingster, Y. I. and Stepanova, N. A. (2014). Adaptive variable selection in nonparametric sparse regression. J. Math. Sci. 199 184–201.
• [16] Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 73–103.
• [17] Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of graphlet screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723–2772.
• [18] Lafferty, J. and Wasserman, L. (2008). Rodeo: Sparse, greedy nonparametric regression. Ann. Statist. 36 28–63.
• [19] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
• [20] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
• [21] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [22] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
• [23] Neuvial, P. and Roquain, E. (2012). On false discovery rate thresholding for classification under sparsity. Ann. Statist. 40 2572–2600.
• [24] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [25] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
• [26] Zhang, A. Y. and Zhou, H. H. (2016). Minimax rates of community detection in stochastic block models. Ann. Statist. 44 2252–2280.
• [27] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• [28] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

#### Supplemental materials

• Supplement to “Variable selection with Hamming loss”. We derive a general lower bound for the minimax risk over all selectors on the class of at most $s$-sparse vectors. The main term of this bound is a Bayes risk with arbitrary prior and the non-asymptotic remainder term is given explicitly. Using this, we prove the lower bounds of Theorems 2.2, 3.2 and 3.3.