Electronic Journal of Statistics

High-dimensional Bayesian inference in nonparametric additive models

Zuofeng Shang and Ping Li

Full-text: Open access


A fully Bayesian approach is proposed for ultrahigh-dimensional nonparametric additive models in which the number of additive components may be larger than the sample size, though ideally the true model is believed to include only a small number of components. Bayesian approaches can conduct stochastic model search and fulfill flexible parameter estimation by stochastic draws. The theory shows that the proposed model selection method has satisfactory properties. For instance, when the hyperparameter associated with the model prior is correctly specified, the true model has posterior probability approaching one as the sample size goes to infinity; when this hyperparameter is incorrectly specified, the selected model is still acceptable since asymptotically it is shown to be nested in the true model. To enhance model flexibility, two new $g$-priors are proposed and their theoretical performance is investigated. We also propose an efficient reversible jump MCMC algorithm to handle the computational issues. Several simulation examples are provided to demonstrate the advantages of our method.

Article information

Electron. J. Statist., Volume 8, Number 2 (2014), 2804-2847.

First available in Project Euclid: 8 January 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G20: Asymptotic properties 62F25: Tolerance and confidence regions
Secondary: 62F15: Bayesian inference 62F12: Asymptotic properties of estimators

Bayesian group selection ultrahigh-dimensionality nonparametric additive model posterior model consistency size-control prior generalized Zellner-Siow prior generalized hyper-$g$ prior reversible jump MCMC


Shang, Zuofeng; Li, Ping. High-dimensional Bayesian inference in nonparametric additive models. Electron. J. Statist. 8 (2014), no. 2, 2804--2847. doi:10.1214/14-EJS963. https://projecteuclid.org/euclid.ejs/1420726192

Export citation


  • [1] Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection., Annals of Statistics 32, 870–897.
  • [2] Berger, J. O. and Pericchi, L. (1996). The intrinsic Bayes factor for model selection and prediction., Journal of the American Statistical Association 91, 109–122.
  • [3] Berger, J. O., Ghosh, J. K. and Mukhopadhyay, N. (2003). Approximations and consistency of Bayes factors as model dimension grows., Journal of Statistical Planning and Inference 112, 241–258.
  • [4] Belitser, E. and Ghosal, S. (2003). Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution., Annals of Statistics 31, 536–559.
  • [5] Buldygin, V. and Kozachenko, Y. (2000)., Metric Characterization of Random Variables and Random Processes. Providence, RI: American Mathematical Society.
  • [6] Courant, R. and Hilbert, D. (1953)., Methods of Mathematical Physics, Volume 1. New York: Interscience Publischer, Inc.
  • [7] Casella, C., Girón, F. J., Martínez, M. L. and Moreno, E. (2009). Consistency of Bayesian procedures for variable selection., Annals of Statistics 37, 1207–1228.
  • [8] Chipman, H., George, E. and McCulloch, R. (2010). BART: Bayesian adaptive regression trees., Annals of Applied Statistics 4, 266–298.
  • [9] Clyde, M., Parmigiani, G. and Vidakovic, B. (1998). Multiple shrinkage and subset selection in wavelets., Biometrika 85, 391–401.
  • [10] Curtis, M., Banerjee, S. and Ghosal, S. (2014). Fast Bayesian model assessment for nonparametric additive regression., Computational Statistics & Data Analysis 71, 347–358.
  • [11] Donoho, D. L. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via $\ell_1$ minimization., Proc. Natl. Acad. Sci. U.S.A. 100, 2197–2202.
  • [12] Fan, J., Feng, Y. and Song, R. (2011). Nonparametric independence screening in sparse ultra-high dimensional additive models., Journal of American Statistical Association 116, 544–557.
  • [13] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space (with discussion)., Journal of Royal Statistical Society B 70, 849–911.
  • [14] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space., Statistica Sinica 20, 101–148.
  • [15] Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality., Annals of Statistics 38, 3567–3604.
  • [16] Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh dimensional variable selection: Beyond the lienar model., Journal of Machine Learning Research 10, 1829–1853.
  • [17] Fernández, C., Ley, E. and Steel, M. F. J. (2001). Benchmark priors for Bayesian model averaging., Journal of Econometrics 100, 381–427.
  • [18] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2003)., Bayesian Data Analysis (2nd ed). Chapman & Hall/CRC.
  • [19] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination., Biometrika 82, 711–732.
  • [20] Green, P. and Hastie, D. (2009). Reversible jump MCMC. Technical Report, University of, Bristol.
  • [21] Girón, F. J., Moreno, E., Casella, G. and Martínez, M. L. (2010). Consistency of objective Bayes factors for nonnested linear models and increasing model dimension., Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales. Serie A. Matematicas 104, 57–67.
  • [22] Godsill, J. S. and Rayner, P. J. W. (1998). Robust reconstruction and analysis of autoregressive signals in impulsive noise using the Gibbs sampler., IEEE Trans. Speech Audio Process 6, 352–372.
  • [23] Golub, G. H. and Van Loan, C. F. (1989)., Matrix Computations (2nd ed). John Hopkins Univ. Press, Baltimore.
  • [24] Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models., Annals of Statistics 36, 587–613.
  • [25] Huang, J., Horowitz, J. and Wei, F. (2010). Variable selection in nonparametric additive models., Annals of Statistics 38, 2282–2313.
  • [26] Hastie, T. J. and Tibshirani, R. J. (1990)., Generalized Additive Models. Chapman & Hall/CRC Monographs on Statistics & Applied Probability.
  • [27] Hsu, D., Kakade, S. M. and Zhang, T. (2012). A tail inequality for quadratic forms of subgaussian random vectors., Electronic Communication in Probability 17, 1–6.
  • [28] Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities., Annals of Statistics 35, 1487–1511.
  • [29] Jiang, W. and Tanner, M. A. (2008). Gibbs posterior for variable selection in high-dimensional classification and data mining., Annals of Statistics 36, 2207–2231.
  • [30] Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings., Journal of the American Statistical Association 107, 649–660.
  • [31] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. 21 st Annual Conference on Learning Theory-COLT 2008, Helsinki, Finland, July 9–12, 2008, R. A. Servedio and T. Zhang (eds.), Omnipress, pp., 229–238.
  • [32] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares., Annals of Statistics 37 3498–3528.
  • [33] Lv, J. and Liu, J. S. (2013). Model selection principles in misspecified models., Journal of the Royal Statistical Society Series B, to appear.
  • [34] Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J. O. (2008). Mixtures of $g$-priors for Bayesian variable selection., Journal of the American Statistical Association 103, 410–423.
  • [35] Liang, F., Song, Q. and Yu, K. (2013). Bayesian subset modeling for high dimensional generalized linear models., Journal of the American Statistical Association, in press.
  • [36] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression., Annals of Statistics 34, 2272–2297.
  • [37] Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the Lasso., Annals of Statistics 34, 1436–1462.
  • [38] Meier, L., van de Geer, S. and Buehlmann, P. (2009). High-dimensional additive modeling., Annals of Statistics 37, 3779–3821.
  • [39] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., Annals of Statistics 37, 246–270.
  • [40] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models., Journal of the Royal Statistical Society, Series B 71, 1009–1030.
  • [41] Stone, C. (1985). Additive regression and other nonparametric models., Annals of Statistics 13, 689–705.
  • [42] Shang, Z. and Clayton, M. K. (2011). Consistency of Bayesian model selection for linear modelswith a growing number of parameters., Journal of Statistical Planning and Inference 11, 3463–3474.
  • [43] Shang, Z. and Clayton, M. K. (2012). An application of Bayesian variable selection to spatial concurrent linear models., Environmental and Ecological Statistics 19, 521–544.
  • [44] Shang, Z. and Li, P. (2014). Bayesian ultrahigh-dimensional screening via MCMC., Journal of Statistical Planning and Inference, in press.
  • [45] Scheipl, F., Fahrmeir, L. and Kneib, T. (2012). Spike-and-slab priors for function selection in structured regression models., Journal of the American Statistical Association 107, 1518–1532.
  • [46] Sabanés Bové, D., Held, L. and Kauermann, G. (2011). Mixtures of $g$-priors for generalised additive model delection with penalised splines. Technical Report, University of, Zurich.
  • [47] Seber, G. A. F. and Lee, A. J. (2003)., Linear Regression Analysis (2nd ed). Wiley-Interscience [John Wiley & Sons], Hoboken, NJ.
  • [48] Shen, X., Pan, W., Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation., Journal of American Statistical Association 107, 223–232.
  • [49] Wolfe, P. J., Godsill, S. J. and Ng, W.-J. (2004). Bayesian variable selection and regularization for time-frequency surface estimation., Journal of the Royal Statistical Society, Series B 66, 575–589.
  • [50] van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso., Annals of Statistics 36, 614–645.
  • [51] Xue, L. and Zou, H. (2011). Sure independence screening and compressed random sensing., Biometrika 98, 371–380.
  • [52] Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models., Annals of Statistics 40, 2541–2571.
  • [53] Yang, Y. and Zou, H. (2013). A cocktail algorithm for solving the elastic net penalized Cox’s regression in high dimensions., Statistics and Its Interface 6, 167–173.
  • [54] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression., Annals of Statistics 36, 1567–1594.
  • [55] Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In, Bayesian Analysis in Econometrics and Statistics: The Zellner View and Papers, A. Zellner (ed.), Edward Elgar Publishing Limited, pp. 389–399.
  • [56] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., Journal of Machine Learning Research 7, 2541–2567.