The Annals of Statistics

Variable selection with Hamming loss

Cristina Butucea, Mohamed Ndaoud, Natalia A. Stepanova, and Alexandre B. Tsybakov

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We derive nonasymptotic bounds for the minimax risk of variable selection under expected Hamming loss in the Gaussian mean model in $\mathbb{R}^{d}$ for classes of at most $s$-sparse vectors separated from 0 by a constant $a>0$. In some cases, we get exact expressions for the nonasymptotic minimax risk as a function of $d,s,a$ and find explicitly the minimax selectors. These results are extended to dependent or non-Gaussian observations and to the problem of crowdsourcing. Analogous conclusions are obtained for the probability of wrong recovery of the sparsity pattern. As corollaries, we derive necessary and sufficient conditions for such asymptotic properties as almost full recovery and exact recovery. Moreover, we propose data-driven selectors that provide almost full and exact recovery adaptively to the parameters of the classes.

Article information

Ann. Statist., Volume 46, Number 5 (2018), 1837-1875.

Received: December 2015
Revised: March 2017
First available in Project Euclid: 17 August 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation 62G08: Nonparametric regression 62G20: Asymptotic properties

Adaptive variable selection almost full recovery exact recovery Hamming loss minimax selectors nonasymptotic minimax selection bounds phase transitions


Butucea, Cristina; Ndaoud, Mohamed; Stepanova, Natalia A.; Tsybakov, Alexandre B. Variable selection with Hamming loss. Ann. Statist. 46 (2018), no. 5, 1837--1875. doi:10.1214/17-AOS1572.

Export citation


  • [1] Abramovich, F. and Benjamini, Y. (1995). Thresholding of wavelet coefficients as multiple hypotheses testing procedure. In Wavelets and Statistics, Lecture Notes in Statistics 103 5–14. Springer, New York.
  • [2] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [3] Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series 55. For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C.
  • [4] Arias-Castro, E. and Chen, S. (2017). Distribution-free multiple testing. Electron. J. Stat. 11 1983–2001.
  • [5] Bertin, K. and Lecué, G. (2008). Selection of variables and dimension reduction in high-dimensional non-parametric regression. Electron. J. Stat. 2 1224–1241.
  • [6] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140.
  • [7] Butucea, C., Ingster, Y. I. and Suslina, I. A. (2015). Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM Probab. Stat. 19 115–134.
  • [8] Butucea, C., Ndaoud, M., Stepanova, N. A. and Tsybakov, A. B. (2018). Supplement to “Variable selection with Hamming loss.” DOI:10.1214/17-AOS1572SUPP.
  • [9] Butucea, C. and Stepanova, N. (2017). Adaptive variable selection in nonparametric sparse additive models. Electron. J. Stat. 11 2321–2357.
  • [10] Collier, O., Comminges, L., Tsybakov, A. B. and Verzelen, N. (2016). Optimal adaptive estimation of linear functionals under sparsity.
  • [11] Comminges, L. and Dalalyan, A. S. (2012). Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Statist. 40 2667–2696.
  • [12] Gao, C., Lu, Y. and Zhou, D. (2016). Exact exponent in optimal rates for crowdsourcing.
  • [13] Genovese, C. R., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107–2143.
  • [14] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [15] Ingster, Y. I. and Stepanova, N. A. (2014). Adaptive variable selection in nonparametric sparse regression. J. Math. Sci. 199 184–201.
  • [16] Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 73–103.
  • [17] Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of graphlet screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723–2772.
  • [18] Lafferty, J. and Wasserman, L. (2008). Rodeo: Sparse, greedy nonparametric regression. Ann. Statist. 36 28–63.
  • [19] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • [20] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
  • [21] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [22] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
  • [23] Neuvial, P. and Roquain, E. (2012). On false discovery rate thresholding for classification under sparsity. Ann. Statist. 40 2572–2600.
  • [24] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [25] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
  • [26] Zhang, A. Y. and Zhou, H. H. (2016). Minimax rates of community detection in stochastic block models. Ann. Statist. 44 2252–2280.
  • [27] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • [28] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

Supplemental materials

  • Supplement to “Variable selection with Hamming loss”. We derive a general lower bound for the minimax risk over all selectors on the class of at most $s$-sparse vectors. The main term of this bound is a Bayes risk with arbitrary prior and the non-asymptotic remainder term is given explicitly. Using this, we prove the lower bounds of Theorems 2.2, 3.2 and 3.3.