Bernoulli

• Bernoulli
• Volume 19, Number 5B (2013), 2277-2293.

Multi-stage convex relaxation for feature selection

Tong Zhang

Abstract

A number of recent work studied the effectiveness of feature selection using Lasso. It is known that under the restricted isometry properties (RIP), Lasso does not generally lead to the exact recovery of the set of nonzero coefficients, due to the looseness of convex relaxation. This paper considers the feature selection property of nonconvex regularization, where the solution is given by a multi-stage convex relaxation scheme. The nonconvex regularizer requires two tuning parameters (compared to one tuning parameter for Lasso). Although the method is more complex than Lasso, we show that under appropriate conditions including the dependence of a tuning parameter on the support set size, the local solution obtained by this procedure recovers the set of nonzero coefficients without suffering from the bias of Lasso relaxation, which complements parameter estimation results of this procedure in (J. Mach. Learn. Res. 11 (2011) 1087–1107).

Article information

Source
Bernoulli, Volume 19, Number 5B (2013), 2277-2293.

Dates
First available in Project Euclid: 3 December 2013

Permanent link to this document
https://projecteuclid.org/euclid.bj/1386078603

Digital Object Identifier
doi:10.3150/12-BEJ452

Mathematical Reviews number (MathSciNet)
MR3160554

Zentralblatt MATH identifier
1359.62293

Citation

Zhang, Tong. Multi-stage convex relaxation for feature selection. Bernoulli 19 (2013), no. 5B, 2277--2293. doi:10.3150/12-BEJ452. https://projecteuclid.org/euclid.bj/1386078603

References

• [1] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [2] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• [3] Candes, E.J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
• [4] Candès, E.J., Wakin, M.B. and Boyd, S.P. (2008). Enhancing sparsity by reweighted $l_{1}$ minimization. J. Fourier Anal. Appl. 14 877–905.
• [5] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [6] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7–57.
• [7] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
• [8] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [9] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [10] van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• [11] Wainwright, M.J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [12] Wipf, D.P. and Nagarajan, S. (2010). Iterative reweighted $\ell_{1}$ and $\ell_{2}$ methods for finding sparse solutions. Journal of Selected Topics in Signal Processing 4 317–329.
• [13] Zhang, C.H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• [14] Zhang, C.H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
• [15] Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_{1}$ regularization. Ann. Statist. 37 2109–2144.
• [16] Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.
• [17] Zhang, T. (2011). Adaptive forward–backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57 4689–4708.
• [18] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
• [19] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
• [20] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.