## The Annals of Statistics

### On the computational complexity of high-dimensional Bayesian variable selection

#### Abstract

We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis–Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.

#### Article information

Source
Ann. Statist., Volume 44, Number 6 (2016), 2497-2532.

Dates
Revised: September 2015
First available in Project Euclid: 23 November 2016

https://projecteuclid.org/euclid.aos/1479891626

Digital Object Identifier
doi:10.1214/15-AOS1417

Mathematical Reviews number (MathSciNet)
MR3576552

Zentralblatt MATH identifier
1359.62088

Subjects
Primary: 62F15: Bayesian inference
Secondary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces)

#### Citation

Yang, Yun; Wainwright, Martin J.; Jordan, Michael I. On the computational complexity of high-dimensional Bayesian variable selection. Ann. Statist. 44 (2016), no. 6, 2497--2532. doi:10.1214/15-AOS1417. https://projecteuclid.org/euclid.aos/1479891626

#### References

• [1] An, H., Huang, D., Yao, Q. and Zhang, C. (2008). Stepwise searching for feature variables in high-dimensional linear regression. Technical report, Dept. Statistics, London School of Economics.
• [2] Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
• [3] Belloni, A. and Chernozhukov, V. (2009). On the computational complexity of MCMC-based estimators in large samples. Ann. Statist. 37 2011–2055.
• [4] Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet–Laplace Priors for Optimal Shrinkage. J. Amer. Statist. Assoc. 110 1479–1490.
• [5] Borgs, C., Chayes, J. T., Frieze, A., Kim, J. H., Tetali, P., Vigoda, E. and Vu, V. H. (1999). Torpid mixing of some Monte Carlo Markov chain algorithms in statistical physics. In 40th Annual Symposium on Foundations of Computer Science (New York, 1999) 218–229. IEEE Computer Soc., Los Alamitos, CA.
• [6] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986–2018.
• [7] Diaconis, P. and Stroock, D. (1991). Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab. 1 36–61.
• [8] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [9] Fernández, C., Ley, E. and Steel, M. F. J. (2001). Benchmark priors for Bayesian model averaging. J. Econometrics 100 381–427.
• [10] Gelman, A. and Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.
• [11] George, E. and McCulloch, R. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
• [12] Guan, Y. and Stephens, M. (2011). Bayesian variable selection regression for Genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5 1780–1815.
• [13] Hans, C., Dobra, A. and West, M. (2007). Shotgun stochastic search for “large $p$” regression. J. Amer. Statist. Assoc. 102 507–516.
• [14] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press, Cambridge.
• [15] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
• [16] Jones, G. L. and Hobert, J. P. (2004). Sufficient burn-in for Gibbs samplers for a hierarchical random effects model. Ann. Statist. 32 784–817.
• [17] Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc. 90 928–934.
• [18] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
• [19] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
• [20] Levin, D. A., Luczak, M. J. and Peres, Y. (2010). Glauber dynamics for the mean-field Ising model: Cut-off, critical power law, and metastability. Probab. Theory Related Fields 146 223–265.
• [21] Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of $g$ priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
• [22] Martinelli, F. and Sinclair, A. (2012). Mixing time for the solid-on-solid model. Ann. Appl. Probab. 22 1136–1166.
• [23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
• [24] Mossel, E. and Vigoda, E. (2006). Limitations of Markov chain Monte Carlo algorithms for Bayesian inference of phylogeny. Ann. Appl. Probab. 16 2215–2234.
• [25] Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
• [26] Schreck, A., Fort, G., Corff, S. L. and Moulines, E. (2015). A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection. Available at arXiv:1312.5658.
• [27] Shang, Z. and Clayton, M. K. (2011). Consistency of Bayesian linear model selection with a growing number of parameters. J. Statist. Plann. Inference 141 3463–3474.
• [28] Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. J. Amer. Statist. Assoc. 107 223–232.
• [29] Sinclair, A. (1988). Algorithms for random generation and counting: A Markov chain approach. Ph.D. thesis, Univ. Edinburgh.
• [30] Sinclair, A. (1992). Improved bounds for mixing rates of Markov chains and multicommodity flow. Combin. Probab. Comput. 1 351–370.
• [31] Sparks, D., Khare, K. and Ghosh, M. (2015). Necessary and sufficient conditions for high-dimensional posterior consistency under $g$-priors. Bayesian Anal. 10 627–664.
• [32] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [33] Wainwright, M. J. (2009). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728–5741.
• [34] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [35] Woodard, D. B. and Rosenthal, J. S. (2013). Convergence rate of Markov chain methods for genomic motif discovery. Ann. Statist. 41 91–124.
• [36] Yang, Y., Wainwright, M. J. and Jordan, M. I. (2016). Supplement to “On the computational complexity of high-dimensional Bayesian variable selection.” DOI:10.1214/15-AOS1417SUPP.
• [37] Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In Bayesian Inference and Decision Techniques (P. K. Goel andA. Zellner, eds.). Stud. Bayesian Econometrics Statist. 6 233–243. North-Holland, Amsterdam.
• [38] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• [39] Zhang, T. (2011). Adaptive forward–backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57 4689–4708.
• [40] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

#### Supplemental materials

• Supplement to “On the computational complexity of high-dimensional Bayesian variable selection”. Owing to space constraints, we have moved some materials and technical proofs to the Appendix, which is contained in the supplementary document.