## Electronic Journal of Statistics

### Quantile universal threshold

#### Abstract

Efficient recovery of a low-dimensional structure from high-dimensional data has been pursued in various settings including wavelet denoising, generalized linear models and low-rank matrix estimation. By thresholding some parameters to zero, estimators such as lasso, elastic net and subset selection perform variable selection. One crucial step challenges all these estimators: the amount of thresholding governed by a threshold parameter $\lambda$. If too large, important features are missing; if too small, incorrect features are included. Within a unified framework, we propose a selection of $\lambda$ at the detection edge. To that aim, we introduce the concept of a zero-thresholding function and a null-thresholding statistic, that we explicitly derive for a large class of estimators. The new approach has the great advantage of transforming the selection of $\lambda$ from an unknown scale to a probabilistic scale. Numerical results show the effectiveness of our approach in terms of model selection and prediction.

#### Article information

Source
Electron. J. Statist., Volume 11, Number 2 (2017), 4701-4722.

Dates
First available in Project Euclid: 24 November 2017

https://projecteuclid.org/euclid.ejs/1511492459

Digital Object Identifier
doi:10.1214/17-EJS1366

Mathematical Reviews number (MathSciNet)
MR3729656

Zentralblatt MATH identifier
1384.62258

#### Citation

Giacobino, Caroline; Sardy, Sylvain; Diaz-Rodriguez, Jairo; Hengartner, Nick. Quantile universal threshold. Electron. J. Statist. 11 (2017), no. 2, 4701--4722. doi:10.1214/17-EJS1366. https://projecteuclid.org/euclid.ejs/1511492459

#### References

• [1] H. Akaike. Information theory and an extension of the maximum likelihood principle. In, Selected Papers of Hirotugu Akaike, pages 199–213. Springer, 1998.
• [2] A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse models., Bernoulli, 19(2):521–547, 2013.
• [3] A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse signals via conic programming., Biometrika, 98(4):791–806, 2011.
• [4] L. Breiman, J. Friedman, R. Olshen, and C. Stone., Classification and Regression Trees. Wadsworth and Brooks/Cole Advanced Books & Software, Monterey, CA, 1984.
• [5] P. Bühlmann and S. van de Geer., Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg, 2011.
• [6] P. Bühlmann, M. Kalisch, and L. Meier. High-dimensional statistics with a view toward applications in biology., Annual Review of Statistics and Its Application, 1:255–278, 2014.
• [7] F. Bunea, J. Lederer, and Y. She. The group square-root lasso: theoretical properties and fast algorithms., IEEE Transactions on Information Theory, 60(2) :1313–1325, 2014.
• [8] J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion., SIAM Journal on Optimization, 20(4) :1956–1982, 2010.
• [9] E. Candès and J. Romberg. Sparsity and incoherence in compressive sampling., Inverse Problems, 23(3):969–985, 2007.
• [10] E. Candès and T. Tao. The Dantzig selector: statistical estimation when $p$ is much larger than $n$., The Annals of Statistics, 35(6) :2313–2351, 2007.
• [11] E. J. Candès, C. A. Sing-Long, and J. D. Trzasko. Unbiased risk estimates for singular value thresholding and spectral estimators., IEEE Transactions on Signal Processing, 61(19) :4643–4657, 2013.
• [12] J. Chen and Z. Chen. Extended Bayesian information criteria for model selection with large model spaces., Biometrika, 95(3):759–771, 2008.
• [13] D. L. Donoho. Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition., Applied and Computational Harmonic Analysis, 2(2):101–126, 1995.
• [14] D. L. Donoho. Compressed sensing., IEEE Transactions on Information Theory, 52(4) :1289–1306, 2006.
• [15] D. L. Donoho and I. M. Johnstone. Ideal spatial adaptation by wavelet shrinkage., Biometrika, 81(3):425–455, 1994.
• [16] D. L. Donoho and J. Tanner. Precise undersampling theorems., Proceedings of the IEEE, 98(6):913–924, 2010.
• [17] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage: asymptopia?, Journal of the Royal Statistical Society: Series B, 57(2):301–369, 1995.
• [18] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Density estimation by wavelet thresholding., The Annals of Statistics, 24(2):508–539, 1996.
• [19] J. Fan and H. Peng. Nonconcave penalized likelihood with a diverging number of parameters., The Annals of Statistics, 32(3):928–961, 2004.
• [20] J. Fan, S. Guo, and N. Hao. Variance estimation using refitted cross-validation in ultrahigh dimensional regression., Journal of the Royal Statistical Society: Series B, 74(1):37–65, 2012.
• [21] Y. Fan and C. Y. Tang. Tuning parameter selection in high dimensional penalized likelihood., Journal of the Royal Statistical Society: Series B, 75(3):531–552, 2013.
• [22] W. J. Fu. Penalized regressions: the bridge versus the lasso., Journal of Computational and Graphical Statistics, 7(3):397–416, 1998.
• [23] M. Gavish and D. L. Donoho. Optimal shrinkage of singular values., arXiv:1405.7511v2, 2014.
• [24] C. Giacobino., Thresholding estimators for high-dimensional data: model selection, testing and existence. PhD thesis, University of Geneva, 2017.
• [25] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring., Science, 286 (5439):531–537, 1999.
• [26] A. E. Hoerl and R. W. Kennard. Ridge regression: biased estimation for nonorthogonal problems., Technometrics, 12(1):55–67, 1970.
• [27] W. James and C. Stein. Estimation with quadratic loss. In, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, pages 361–379, Berkeley, California, 1961. University of California Press.
• [28] J. Josse and F. Husson. Selecting the number of components in PCA using cross-validation approximations., Computational Statististics and Data Analysis, 56(6) :1869–1879, 2012.
• [29] J. Josse and S. Sardy. Adaptive shrinkage of singular values., Statistics and Computing, 26(3):715–724, 2016.
• [30] N. Kushmerick. Learning to remove internet advertisements. In, Proceedings of the third international conference on Autonomous Agents, pages 175–181. ACM, 1999.
• [31] C. Leng, Y. Lin, and G. Wahba. A note on the lasso and related procedures in model selection., Statistica Sinica, 16(4) :1273–1284, 2006.
• [32] R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incomplete matrices., Journal of Machine Learning Research, 11 :2287–2322, 2010.
• [33] L. Meier, S. van de Geer, and P. Bühlmann. The group lasso for logistic regression., Journal of the Royal Statistical Society, Series B, 70(1):53–71, 2008.
• [34] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso., The Annals of Statistics, 34 :1436–1462, 2006.
• [35] N. Meinshausen and P. Bühlmann. Stability selection., Journal of the Royal Statistical Society: Series B, 72(4):417–473, 2010.
• [36] A. Mukherjee, K. Chen, N. Wang, and J. Zhu. On the degrees of freedom of reduced-rank estimators in multivariate regression., Biometrika, 102(2):457–477, 2015.
• [37] J. A. Nelder and R. W. M. Wedderburn. Generalized linear models., Journal of the Royal Statistical Society: Series A, 135(3):370–384, 1972.
• [38] D. Neto, S. Sardy, and P. Tseng. $\ell _1$-penalized likelihood smoothing and segmentation of volatility processes allowing for abrupt changes., Journal of Computational and Graphical Statistics, 21(1):217–233, 2012.
• [39] A. B. Owen and P. O. Perry. Bi-cross-validation of the svd and the nonnegative matrix factorization., Annals of Applied Statistics, 3(2):564–594, 2009.
• [40] M. Y. Park and T. Hastie. $L_1$-regularization-path algorithm for generalized linear models., Journal of the Royal Statistical Society: Series B, 69(4):659–677, 2007.
• [41] J. Pitman and M. Yor. The law of the maximum of a bessel bridge., Electronic Journal of Probability, 4:1–35, 1999.
• [42] S. Reid, R. Tibshirani, and J. Friedman. A study of error variance estimation in lasso regression., arXiv:1311.5274v2, 2014.
• [43] R. T. Rockafellar., Convex Analysis. Princeton University Press, Princeton, 1970.
• [44] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms., Physica D, 60:259–268, 1992.
• [45] S. Sardy. On the practice of rescaling covariates., International Statistical Review, 76(2):285–297, 2008.
• [46] S. Sardy. Adaptive posterior mode estimation of a sparse sequence for model selection., Scandinavian Journal of Statistics, 36(4):577–601, 2009.
• [47] S. Sardy. Smooth blockwise iterative thresholding: a smooth fixed point estimator based on the likelihood’s block gradient., Journal of the American Statistical Association, 107(498):800–813, 2012.
• [48] S. Sardy and P. Tseng. On the statistical analysis of smoothing by maximizing dirty markov random field posterior distributions., Journal of the American Statistical Association, 99(465):191–204, 2004.
• [49] S. Sardy and P. Tseng. Density estimation by total variation penalized likelihood driven by the sparsity $\ell _1$ information criterion., Scandinavian Journal of Statistics, 37(2):321–337, 2010.
• [50] S. Sardy, A. Antoniadis, and P. Tseng. Automatic smoothing with wavelets for a wide class of distributions., Journal of Computational and Graphical Statistics, 13(2):399–421, 2004.
• [51] G. Schwarz. Estimating the dimension of a model., The Annals of Statistics, 6(2):461–464, 1978.
• [52] N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. A sparse-group lasso., Journal of Computational and Graphical Statistics, 22(2):231–245, 2013.
• [53] C. M. Stein. Estimation of the mean of a multivariate normal distribution., The Annals of Statistics, 9(6) :1135–1151, 1981.
• [54] R. Tibshirani. Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1996.
• [55] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused lasso., Journal of the Royal Statistical Society, Series B, 67(1):91–108, 2005.
• [56] R. J. Tibshirani and J. Taylor. The solution path of the generalized lasso., The Annals of Statistics, 39(3) :1335–1371, 2011.
• [57] R. J. Tibshirani and J. Taylor. Degrees of freedom in lasso problems., The Annals of Statistics, 40(2) :1198–1232, 2012.
• [58] A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization method., Soviet Mathematics Doklady, 4(4) :1035–1038, 1963.
• [59] G. Wahba., Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia, 1990.
• [60] H. Wang, G. Li, and G. Jiang. Robust regression shrinkage and consistent variable selection through the LAD-lasso., Journal of Business & Economic Statistics, 25(3):347–355, 2007.
• [61] Y. Yang. Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation., Biometrika, 92(4):937–950, 2005.
• [62] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables., Journal of the Royal Statistical Society, Series B, 68(1):49–67, 2006.
• [63] C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty., The Annals of Statistics, 38(2):894–942, 2010.
• [64] H. Zou. The adaptive lasso and its oracle properties., Journal of the American Statistical Association, 101(476) :1418–1429, 2006.
• [65] H. Zou and T. Hastie. Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society: Series B, 67(2):301–320, 2005.
• [66] H. Zou, T. Hastie, and R. Tibshirani. On the “degrees of freedom” of the lasso., The Annals of Statistics, 35(5) :2173–2192, 2007.