## The Annals of Statistics

### Tight conditions for consistency of variable selection in the context of high dimensionality

#### Abstract

We address the issue of variable selection in the regression model with very high ambient dimension, that is, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension, is much smaller than the ambient dimension $d$. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is based on comparing quadratic functionals of the empirical Fourier coefficients with appropriately chosen threshold values.

The asymptotic analysis reveals the presence of two quite different re gimes. The first regime is when the intrinsic dimension is fixed. In this case the situation in nonparametric regression is the same as in linear regression, that is, consistent variable selection is possible if and only if $\log d$ is small compared to the sample size $n$. The picture is different in the second regime, that is, when the number of relevant variables denoted by $s$ tends to infinity as $n\to\infty$. Then we prove that consistent variable selection in nonparametric set-up is possible only if $s+\log\log d$ is small compared to $\log n$. We apply these results to derive minimax separation rates for the problem of variable selection.

#### Article information

Source
Ann. Statist., Volume 40, Number 5 (2012), 2667-2696.

Dates
First available in Project Euclid: 4 February 2013

https://projecteuclid.org/euclid.aos/1359987534

Digital Object Identifier
doi:10.1214/12-AOS1046

Mathematical Reviews number (MathSciNet)
MR3097616

Zentralblatt MATH identifier
1373.62154

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62H12: Estimation 62H15: Hypothesis testing

#### Citation

Comminges, Laëtitia; Dalalyan, Arnak S. Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Statist. 40 (2012), no. 5, 2667--2696. doi:10.1214/12-AOS1046. https://projecteuclid.org/euclid.aos/1359987534

#### References

• [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267–281. Akadémiai Kiadó, Budapest.
• [2] Alquier, P. (2008). Iterative feature selection in least square regression estimation. Ann. Inst. Henri Poincaré Probab. Stat. 44 47–88.
• [3] Bach, F. (2009). High-dimensional non-linear variable selection through hierarchical kernel learning. Technical report. Available at arXiv:0909.0844.
• [4] Bertin, K. and Lecué, G. (2008). Selection of variables and dimension reduction in high-dimensional non-parametric regression. Electron. J. Stat. 2 1224–1241.
• [5] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2010). Hierarchical selection of variables in sparse high-dimensional regression. In Borrowing Strength: Theory Powering Applications—a Festschrift for Lawrence D. Brown. Inst. Math. Stat. Collect. 6 56–69. IMS, Beachwood, OH.
• [6] Brown, L. D., Carter, A. V., Low, M. G. and Zhang, C.-H. (2004). Equivalence theory for density estimation, Poisson processes and Gaussian white noise with drift. Ann. Statist. 32 2074–2097.
• [7] Brown, L. D. and Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24 2384–2398.
• [8] Bunea, F. and Barbu, A. (2009). Dimension reduction and variable selection in case control studies via regularized likelihood optimization. Electron. J. Stat. 3 1257–1287.
• [9] Cai, T. T. and Low, M. G. (2006). Optimal adaptive estimation of a quadratic functional. Ann. Statist. 34 2298–2325.
• [10] Comminges, L. (2011). Conditions minimales de consistance pour la sélection de variables en grande dimension. C. R. Math. Acad. Sci. Paris 349 469–472.
• [11] Comminges, L. and Dalalyan, A. (2012). Supplement to “Tight conditions for consistency of variable selection in the context of high dimensionality.” DOI:10.1214/12-AOS1046SUPP.
• [12] Comminges, L. and Dalalyan, A. S. (2011). Tight conditions for consistent variable selection in high dimensional nonparametric regression. J. Mach. Learn. Res. 19 187–206.
• [13] Dalalyan, A. and Reiß, M. (2006). Asymptotic statistical equivalence for scalar ergodic diffusions. Probab. Theory Related Fields 134 248–282.
• [14] Dieudonné, J. (1968). Calcul Infinitésimal. Hermann, Paris.
• [15] Donoho, D. and Jin, J. (2009). Feature selection by higher criticism thresholding achieves the optimal phase diagram. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4449–4470.
• [16] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [17] Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
• [18] Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 10 2013–2038.
• [19] Gayraud, G. and Ingster, Y. (2012). Detection of sparse variable functions. Electron. J. Stat. 6 1409–1448.
• [20] Hebiri, M. (2010). Sparse conformal predictors. Stat. Comput. 20 253–266.
• [21] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978–2004.
• [22] Ingster, Y. and Stepanova, N. (2011). Estimation and detection of functions from anisotropic Sobolev classes. Electron. J. Stat. 5 484–506.
• [23] Ingster, Y. I. and Suslina, I. A. (2007). Estimation and hypothesis testing for functions from tensor products of spaces. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 351 180–218, 301–302.
• [24] Jenatton, R., Audibert, J.-Y. and Bach, F. (2011). Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12 2777–2824.
• [25] Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist. 38 3660–3695.
• [26] Lafferty, J. and Wasserman, L. (2008). Rodeo: Sparse, greedy nonparametric regression. Ann. Statist. 36 28–63.
• [27] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
• [28] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
• [29] Mallows, C. L. (1973). Some comments on $C_p$. Technometrics 15 661–675.
• [30] Mazo, J. E. and Odlyzko, A. M. (1990). Lattice points in high-dimensional spheres. Monatsh. Math. 110 47–61.
• [31] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417–473.
• [32] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
• [33] Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13 389–427.
• [34] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell _1$-regularized logistic regression. Ann. Statist. 38 1287–1319.
• [35] Reiß, M. (2008). Asymptotic equivalence for nonparametric regression with multivariate and random design. Ann. Statist. 36 1957–1982.
• [36] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
• [37] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587–2619.
• [38] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [39] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
• [40] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38–90.
• [41] Wainwright, M. J. (2009). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728–5741.
• [42] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
• [43] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
• [44] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• [45] Zhang, T. (2009). On the consistency of feature selection using greedy least squares regression. J. Mach. Learn. Res. 10 555–568.
• [46] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
• [47] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

#### Supplemental materials

• Supplementary material: Proofs of some results. The supplementary material provides the proof of Theorem 3, Proposition 7, Lemma 10 and Corollary 3, as well as those of some technical lemmas.