The Annals of Statistics

Bayesian manifold regression

Yun Yang and David B. Dunson

Full-text: Open access


There is increasing interest in the problem of nonparametric regression with high-dimensional predictors. When the number of predictors $D$ is large, one encounters a daunting problem in attempting to estimate a $D$-dimensional surface based on limited data. Fortunately, in many applications, the support of the data is concentrated on a $d$-dimensional subspace with $d\ll D$. Manifold learning attempts to estimate this subspace. Our focus is on developing computationally tractable and theoretically supported Bayesian nonparametric regression methods in this context. When the subspace corresponds to a locally-Euclidean compact Riemannian manifold, we show that a Gaussian process regression approach can be applied that leads to the minimax optimal adaptive rate in estimating the regression function under some conditions. The proposed model bypasses the need to estimate the manifold, and can be implemented using standard algorithms for posterior computation in Gaussian processes. Finite sample performance is illustrated in a data analysis example.

Article information

Ann. Statist., Volume 44, Number 2 (2016), 876-905.

Received: December 2014
Revised: September 2015
First available in Project Euclid: 17 March 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62-07: Data analysis
Secondary: 65U05 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Asymptotics contraction rates dimensionality reduction Gaussian process manifold learning nonparametric Bayes subspace learning


Yang, Yun; Dunson, David B. Bayesian manifold regression. Ann. Statist. 44 (2016), no. 2, 876--905. doi:10.1214/15-AOS1390.

Export citation


  • [1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
  • [2] Belkin, M. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373–1396.
  • [3] Bhattacharya, A., Pati, D. and Dunson, D. (2014). Anisotropic function estimation using multi-bandwidth Gaussian processes. Ann. Statist. 42 352–381.
  • [4] Bickel, P. J. and Kleijn, B. J. K. (2012). The semiparametric Bernstein–von Mises theorem. Ann. Statist. 40 206–237.
  • [5] Bickel, P. J. and Li, B. (2007). Local polynomial regression on unknown manifolds. In Complex Datasets and Inverse Problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series 54 177–186. IMS, Beachwood, OH.
  • [6] Binev, P., Cohen, A., Dahmen, W. and DeVore, R. (2007). Universal algorithms for learning theory. II. Piecewise polynomial functions. Constr. Approx. 26 127–152.
  • [7] Binev, P., Cohen, A., Dahmen, W., DeVore, R. and Temlyakov, V. (2005). Universal algorithms for learning theory. I. Piecewise constant functions. J. Mach. Learn. Res. 6 1297–1321.
  • [8] Camastra, F. and Vinviarelli, A. (2002). Estimating the intrinsic dimension of data with a fractal-based method. IEEE P.A.M.I. 24 1404–1407.
  • [9] Carter, K. M., Raich, R. and Hero, A. O. III (2010). On local intrinsic dimension estimation and its applications. IEEE Trans. Signal Process. 58 650–663.
  • [10] Castillo, I., Kerkyacharian, G. and Picard, D. (2013). Thomas Bayes’ walk on manifolds. Probab. Theory Related Fields 158 665–710.
  • [11] Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D. and Carin, L. (2010). Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds. IEEE Trans. Signal Process. 58 6140–6155.
  • [12] Farahmand, A. M., Szepesvái, C. and Audibert, J. (2007). Manifold-adaptive dimension estimation. In ICML 2007 265–272. ACM Press, New York.
  • [13] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [14] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192–223.
  • [15] Giné, E. and Nickl, R. (2011). Rates on contraction for posterior distributions in $L^{r}$-metrics, $1\leq r\leq\infty$. Ann. Statist. 39 2883–2911.
  • [16] Kpotufe, S. (2009). Escaping the curse of dimensionality with a tree-based regressor. In COLT 2009—The 22nd Conference on Learning Theory, June 1821. Montreal, QC.
  • [17] Kpotufe, S. and Dasgupta, S. (2012). A tree-based regressor that adapts to intrinsic dimension. J. Comput. System Sci. 78 1496–1515.
  • [18] Kundu, S. and Dunson, D. B. (2011). Latent factor models for density estimation. Available at arXiv:1108.2720v2.
  • [19] Lawrence, N. D. (2003). Gaussian process latent variable models for visualisation of high dimensional data. Neural Information Processing Systems 16 329–336.
  • [20] Levina, E. and Bickel, P. (2004). Maximun likelihood estimation of intrinsic dimension. In Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA.
  • [21] Lin, L. and Dunson, D. B. (2014). Bayesian monotone regression using Gaussian process projection. Biometrika 101 303–317.
  • [22] Little, A. V., Lee, J., Jung, Y. M. and Maggioni, M. (2009). Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale SVD. In 2009 IEEE/SP 15th Workshop on Statistical Signal Processing 85–88. IEEE, Cardiff.
  • [23] Nene, S. A., Nayar, S. K. and Murase, H. (1996). Columbia object image library (COIL-100). Technical report, Columbia Univ., New York.
  • [24] Page, G., Bhattacharya, A. and Dunson, D. (2013). Classification via Bayesian nonparametric learning of affine subspaces. J. Amer. Statist. Assoc. 108 187–201.
  • [25] Reich, B. J., Bondell, H. D. and Li, L. (2011). Sufficient dimension reduction via Bayesian mixture modeling. Biometrics 67 886–895.
  • [26] Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science 290 2323–2326.
  • [27] Savitsky, T., Vannucci, M. and Sha, N. (2011). Variable selection for nonparametric Gaussian process priors: Models and computational strategies. Statist. Sci. 26 130–149.
  • [28] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053.
  • [29] Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319–2323.
  • [30] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 273–282.
  • [31] Tokdar, S. T., Zhu, Y. M. and Ghosh, J. K. (2010). Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Anal. 5 319–344.
  • [32] van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge.
  • [33] van der Vaart, A. and van Zanten, H. (2011). Information rates of nonparametric Gaussian process methods. J. Mach. Learn. Res. 12 2095–2119.
  • [34] van der Vaart, A. W. and van Zanten, J. H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. Collect. 3 200–222. IMS, Beachwood, OH.
  • [35] van der Vaart, A. W. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. Ann. Statist. 37 2655–2675.
  • [36] Yang, Y. and Dunson, D. B. (2015). Supplement to “Bayesian manifold regression.” DOI:10.1214/15-AOS1390SUPP.
  • [37] Ye, G.-B. and Zhou, D.-X. (2008). Learning and approximation by Gaussians on Riemannian manifolds. Adv. Comput. Math. 29 291–310.
  • [38] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.

Supplemental materials

  • Reviews of geometric properties and proofs of Theorems 2.1, 2.2, 2.4 and 3.2. Concepts and results in differential and Riemannian geometry were reviewed in Section 7, where new results are included with proofs. Then proofs of Theorems 2.1, 2.2, 2.4 and 3.2 are provided in Section 8.