The Annals of Statistics

Nonpenalized variable selection in high-dimensional linear model settings via generalized fiducial inference

Abstract

Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model. However, coefficient estimates are unreliable when the design matrix is collinear. To overcome this challenge, an entirely new perspective on variable selection is presented within a generalized fiducial inference framework. This new procedure is able to effectively account for linear dependencies among subsets of covariates in a high-dimensional setting where $p$ can grow almost exponentially in $n$, as well as in the classical setting where $p\le n$. It is shown that the procedure very naturally assigns small probabilities to subsets of covariates which include redundancies by way of explicit $L_{0}$ minimization. Furthermore, with a typical sparsity assumption, it is shown that the proposed method is consistent in the sense that the probability of the true sparse subset of covariates converges in probability to 1 as $n\to\infty$, or as $n\to\infty$ and $p\to\infty$. Very reasonable conditions are needed, and little restriction is placed on the class of possible subsets of covariates to achieve this consistency result.

Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1723-1753.

Dates
First available in Project Euclid: 13 February 2019

https://projecteuclid.org/euclid.aos/1550026855

Digital Object Identifier
doi:10.1214/18-AOS1733

Mathematical Reviews number (MathSciNet)
MR3911128

Zentralblatt MATH identifier
07053524

Citation

Williams, Jonathan P.; Hannig, Jan. Nonpenalized variable selection in high-dimensional linear model settings via generalized fiducial inference. Ann. Statist. 47 (2019), no. 3, 1723--1753. doi:10.1214/18-AOS1733. https://projecteuclid.org/euclid.aos/1550026855

References

• [1] Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. 37 697–725.
• [2] Beaumont, M. A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics 164 1139–1160.
• [3] Berger, J. O. and Pericchi, L. R. (2001). Objective Bayesian methods for model selection: Introduction and comparison. In Model Selection. Institute of Mathematical Statistics Lecture Notes—Monograph Series 38 135–207. IMS, Beachwood, OH.
• [4] Berk, R. A. (2008). Statistical Learning from a Regression Perspective. Springer, New York.
• [5] Bertsimas, D., King, A. and Mazumder, R. (2016). Best subset selection via a modern optimization lens. Ann. Statist. 44 813–852.
• [6] Bondell, H. D. and Reich, B. J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions. J. Amer. Statist. Assoc. 107 1610–1624.
• [7] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232–253.
• [8] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [10] Ghosh, J. and Ghattas, A. E. (2015). Bayesian variable selection under collinearity. Amer. Statist. 69 165–173.
• [11] Hannig, J., Iyer, H., Lai, R. C. S. and Lee, T. C. M. (2016). Generalized fiducial inference: A review and new results. J. Amer. Statist. Assoc. 111 1346–1361.
• [12] Jameson, G. J. O. (2013). Inequalities for gamma function ratios. Amer. Math. Monthly 120 936–940.
• [13] Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649–660.
• [14] Lai, R. C. S., Hannig, J. and Lee, T. C. M. (2015). Generalized fiducial inference for ultrahigh-dimensional regression. J. Amer. Statist. Assoc. 110 760–772.
• [15] Luo, S. and Chen, Z. (2013). Extended BIC for linear regression models with diverging number of relevant features and high or ultra-high feature spaces. J. Statist. Plann. Inference 143 494–504.
• [16] Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
• [17] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R. et al. (2011). Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 2825–2830.
• [18] Ročková, V. and George, E. I. (2018). The Spike-and-Slab LASSO. J. Amer. Statist. Assoc. 113 431–444.
• [19] Rossell, D. and Telesca, D. (2017). Nonlocal priors for high-dimensional estimation. J. Amer. Statist. Assoc. 112 254–265.
• [20] Shin, M., Bhattacharya, A. and Johnson, V. E. (2018). Scalable Bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statist. Sinica 28 1053–1078.
• [21] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [22] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.