Electronic Journal of Statistics

Selection by partitioning the solution paths

Yang Liu and Peng Wang

Full-text: Open access

Abstract

The performance of penalized likelihood approaches depends profoundly on the selection of the tuning parameter; however, there is no commonly agreed-upon criterion for choosing the tuning parameter. Moreover, penalized likelihood estimation based on a single value of the tuning parameter suffers from several drawbacks. This article introduces a novel approach for feature selection based on the entire solution paths rather than the choice of a single tuning parameter, which significantly improves the accuracy of the selection. Moreover, the approach allows for feature selection using ridge or other strictly convex penalties. The key idea is to classify variables as relevant or irrelevant at each tuning parameter and then to select all of the variables which have been classified as relevant at least once. We establish the theoretical properties of the method, which requires significantly weaker conditions than existing methods in the literature. We also illustrate the advantages of the proposed approach with simulation studies and a data example.

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 1988-2017.

Dates
Received: January 2018
First available in Project Euclid: 18 June 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1529308885

Digital Object Identifier
doi:10.1214/18-EJS1434

Mathematical Reviews number (MathSciNet)
MR3815303

Zentralblatt MATH identifier
06917429

Subjects
Primary: 62F07: Ranking and selection
Secondary: 62J07: Ridge regression; shrinkage estimators 62J86: Fuzziness, and linear inference and regression

Keywords
Penalized likelihood lasso variable/feature selection solution paths AIC/BIC cross-validation tuning

Rights
Creative Commons Attribution 4.0 International License.

Citation

Liu, Yang; Wang, Peng. Selection by partitioning the solution paths. Electron. J. Statist. 12 (2018), no. 1, 1988--2017. doi:10.1214/18-EJS1434. https://projecteuclid.org/euclid.ejs/1529308885


Export citation

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle., Proceedings of the 2nd International Symposium on Information, BN Petrow, F.
  • Atkinson, A. C. (1980). A Note on the Generalized Information Criterion for Choice of a Model., Biometrika 67 413–418.
  • Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection., The Annals of Statistics 32 870–897.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector., The Annals of Statistics 37 1705–1732.
  • Bogdan, M., Ghosh, J. K. and Doerge, R. (2004). Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci., Genetics 167 989–999.
  • Broman, K. W. and Speed, T. P. (2002). A model selection approach for the identification of quantitative trait loci in experimental crosses., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 641–656.
  • Bühlmann, P. and van de Geer, S. (2011)., Statistics for high-dimensional data: methods, theory and applications. Springer.
  • Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso., Electronic Journal of Statistics 1 169–194.
  • Candes, E. J. and Tao, T. (2005). Decoding by linear programming., Information Theory, IEEE Transactions on 51 4203–4215.
  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces., Biometrika 95 759–771.
  • Cho, H. and Fryzlewicz, P. (2012). High dimensional variable selection via tilting., Journal of the Royal Statistical Society: series B (statistical methodology) 74 593–622.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96 1348–1360.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 849–911.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters., The Annals of Statistics 32 928–961.
  • Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75 531–552.
  • Foygel, R. and Drton, M. (2010). Extended bayesian information criteria for gaussian graphical models. In, Advances in Neural Information Processing Systems.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics 9 432–441.
  • Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: a tutorial., Statistical science 14 382–401.
  • Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models., The Annals of statistics 38 2282–2313.
  • Lai, N., Wu, D., Fang, X., Lin, Y., Chen, S., Li, Z. and Xu, S. (2015). Serum microRNA-210 as a potential noninvasive biomarker for the diagnosis and prognosis of glioma., British journal of cancer 112 1241–1246.
  • Lv, J. and Liu, J. S. (2014). Model selection principles in misspecified models., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 141–167.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso., The Annals of Statistics 34 1436–1462.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 417–473.
  • Posada, D. and Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests., Systematic biology 53 793–808.
  • Raftery, A. E., Madigan, D. and Hoeting, J. A. (1997). Bayesian model averaging for linear regression models., Journal of the American Statistical Association 92 179–191.
  • Sasaki, M., Nakahira, K., Kawano, Y., Katakura, H., Yoshimine, T., Shimizu, K., Kim, S. U. and Ikenaka, K. (2001). MAGE-E1, a new member of the melanoma-associated antigen gene family and its expression in human glioma., Cancer research 61 4809–4814.
  • Schwarz, G. (1978). Estimating the Dimension of a Model., The Annals of Statistics 6 461–464.
  • Shao, J. and Deng, X. (2012). Estimation in High-Dimensional Linear Models with Deterministic Design Matrices., The Annals of Statistics 40 812–831.
  • Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation., Journal of the American Statistical Association 107 223–232.
  • Siegmund, D. (2004). Model selection in irregular problems: Applications to mapping quantitative trait loci., Biometrika 91 785–800.
  • Song, X., Hao, J., Wang, J., Guo, C., Wang, Y., He, Q., Tang, H., Qin, X., Li, Y., Zhang, Y. et al. (2016). The cancer/testis antigen MAGEC2 promotes amoeboid invasion of tumor cells by enhancing STAT3 signaling., Oncogene.
  • Sun, W., Wang, J. and Fang, Y. (2013). Consistent selection of tuning parameters via variable selection stability., The Journal of Machine Learning Research 14 3419–3440.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society: Series B (Methodological) 58 267–288.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 91–108.
  • van de Geer, S. (2007). The deterministic lasso. In, Seminar für Statistik, Eidgenössische Technische Hochschule (ETH) Zürich.
  • Wang, H., Li, G. and Tsai, C.-L. (2007). Regression coefficient and autoregressive order shrinkage and selection via the lasso., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69 63–78.
  • Wang, S., Nan, B., Rosset, S. and Zhu, J. (2011). Random lasso., The Annals of Applied Statistics 5 468–485.
  • Ying, Q., Liang, L., Guo, W., Zha, R., Tian, Q., Huang, S., Yao, J., Ding, J., Bao, M., Ge, C. et al. (2011). Hypoxia-inducible MicroRNA-210 augments the metastatic potential of tumor cells by targeting vacuole membrane protein 1 in hepatocellular carcinoma., Hepatology 54 2064–2075.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model., Biometrika 94 19–35.
  • Zhang, Y., Li, R. and Tsai, C.-L. (2010). Regularization parameter selections via generalized information criterion., Journal of the American Statistical Association 105 312–323.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., The Journal of Machine Learning Research 7 2541–2563.
  • Zou, H. (2006). The adaptive lasso and its oracle properties., Journal of the American statistical association 101 1418–1429.