The Annals of Statistics

Optimal computational and statistical rates of convergence for sparse nonconvex learning problems

Zhaoran Wang, Han Liu, and Tong Zhang

Full-text: Open access

Abstract

We provide theoretical analysis of the statistical and computational properties of penalized $M$-estimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path-following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence for any local solution attained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty.

Article information

Source
Ann. Statist., Volume 42, Number 6 (2014), 2164-2201.

Dates
First available in Project Euclid: 20 October 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1413810725

Digital Object Identifier
doi:10.1214/14-AOS1238

Mathematical Reviews number (MathSciNet)
MR3269977

Zentralblatt MATH identifier
1302.62066

Subjects
Primary: 62F30: Inference under constraints 90C26: Nonconvex programming, global optimization
Secondary: 62J12: Generalized linear models 90C52: Methods of reduced gradient type

Keywords
Nonconvex regularized $M$-estimation path-following method geometric computational rate optimal statistical rate

Citation

Wang, Zhaoran; Liu, Han; Zhang, Tong. Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Statist. 42 (2014), no. 6, 2164--2201. doi:10.1214/14-AOS1238. https://projecteuclid.org/euclid.aos/1413810725


Export citation

References

  • Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40 2452–2482.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Blumensath, T. and Davies, M. E. (2009). Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27 265–274.
  • Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232–253.
  • Candès, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J., Xue, L. and Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Ann. Statist. 42 819–849.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
  • Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5 1391–1415.
  • Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
  • Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1665–1673.
  • Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7–57.
  • Liu, W. and Luo, X. (2012). High-dimensional sparse precision matrix estimation via sparse column inverse operator. Preprint. Available at arXiv:1203.3896.
  • Loh, P.-L. and Wainwright, M. J. (2013). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Preprint. Available at arXiv:1305.2436.
  • Mairal, J. and Yu, B. (2012). Complexity analysis of the lasso regularization path. Preprint. Available at arXiv:1205.0079.
  • Mazumder, R., Friedman, J. H. and Hastie, T. (2011). SparseNet: Coordinate descent with nonconvex penalties. J. Amer. Statist. Assoc. 106 1125–1138.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization 87. Springer, New York.
  • Nesterov, Yu. (2013). Gradient methods for minimizing composite functions. Math. Program. 140 125–161.
  • Park, M. Y. and Hastie, T. (2007). $L_1$-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 659–677.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_q$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
  • Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. Ann. Statist. 35 1012–1030.
  • Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • She, Y. (2009). Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Stat. 3 384–415.
  • She, Y. (2012). An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Comput. Statist. Data Anal. 56 2976–2990.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • van de Geer, S. (2000). Empirical Processes in $M$-Estimation 45. Cambridge Univ. Press, Cambridge.
  • van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
  • Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • Wang, L., Kim, Y. and Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Statist. 41 2505–2536.
  • Wang, Z., Liu, H. and Zhang, T. (2014). Supplement to “Optimal computational and statistical rates of convergence for sparse nonconvex learning problems.” DOI:10.1214/14-AOS1238SUPP.
  • Wang, L., Wu, Y. and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Amer. Statist. Assoc. 107 214–222.
  • Wright, S. J., Nowak, R. D. and Figueiredo, M. A. T. (2009). Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57 2479–2493.
  • Xiao, L. and Zhang, T. (2013). A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23 1062–1091.
  • Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_1$ regularization. Ann. Statist. 37 2109–2144.
  • Zhang, C.-H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.
  • Zhang, T. (2013). Multi-stage convex relaxation for feature selection. Bernoulli 19 2277–2293.
  • Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27 576–593.
  • Zhao, P. and Yu, B. (2007). Stagewise lasso. J. Mach. Learn. Res. 8 2701–2726.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.

Supplemental materials

  • Supplementary material: Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. We provide the detailed proof in the supplement [Wang, Liu and Zhang (2014)].