## Electronic Journal of Statistics

### A spectral series approach to high-dimensional nonparametric regression

#### Abstract

A key question in modern statistics is how to make fast and reliable inferences for complex, high-dimensional data. While there has been much interest in sparse techniques, current methods do not generalize well to data with nonlinear structure. In this work, we present an orthogonal series estimator for predictors that are complex aggregate objects, such as natural images, galaxy spectra, trajectories, and movies. Our series approach ties together ideas from manifold learning, kernel machine learning, and Fourier methods. We expand the unknown regression on the data in terms of the eigenfunctions of a kernel-based operator, and we take advantage of orthogonality of the basis with respect to the underlying data distribution, $P$, to speed up computations and tuning of parameters. If the kernel is appropriately chosen, then the eigenfunctions adapt to the intrinsic geometry and dimension of the data. We provide theoretical guarantees for a radial kernel with varying bandwidth, and we relate smoothness of the regression function with respect to $P$ to sparsity in the eigenbasis. Finally, using simulated and real-world data, we systematically compare the performance of the spectral series approach with classical kernel smoothing, k-nearest neighbors regression, kernel ridge regression, and state-of-the-art manifold and local regression methods.

#### Article information

Source
Electron. J. Statist., Volume 10, Number 1 (2016), 423-463.

Dates
First available in Project Euclid: 24 February 2016

https://projecteuclid.org/euclid.ejs/1456322681

Digital Object Identifier
doi:10.1214/16-EJS1112

Mathematical Reviews number (MathSciNet)
MR3466189

Zentralblatt MATH identifier
1332.62133

Subjects
Primary: 62G08: Nonparametric regression

#### Citation

Lee, Ann B.; Izbicki, Rafael. A spectral series approach to high-dimensional nonparametric regression. Electron. J. Statist. 10 (2016), no. 1, 423--463. doi:10.1214/16-EJS1112. https://projecteuclid.org/euclid.ejs/1456322681

#### References

• [1] Aronszajn, N. (1950). Theory of reproducing kernels., Transactions of the American Mathematical Society 68(3), 337–404.
• [2] Aswani, A., P. Bickel, and C. Tomlin (2011). Regression on manifolds: Estimation of the exterior derivative., Annals of Statistics 39(1), 48–81.
• [3] Belkin, M. and P. Niyogi (2003). Laplacian eigenmaps for dimensionality reduction and data representation., Neural Computation 6(15), 1373–1396.
• [4] Belkin, M. and P. Niyogi (2005a). Semi-supervised learning on Riemannian manifolds., Machine Learning 56, 209–239.
• [5] Belkin, M. and P. Niyogi (2005b). Towards a theoretical foundation for Laplacian-based manifold methods. In, Proc. Conf. on Learning Theory, Volume 3559, pp. 486–500.
• [6] Belkin, M., P. Niyogi, and V. Sindhwani (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples., Journal of Machine Learning Research 7, 2399–2434.
• [7] Berry, T. and T. Sauer (2016). Density estimation on manifolds with boundary., preprint arXiv:1511.08271v3.
• [8] Bickel, P. J. and B. Li (2007). Local polynomial regression on unknown manifolds. In, IMS Lecture Notes–Monograph Series, Complex Datasets and Inverse Problems, Volume 54, pp. 177–186. Institute of Mathematical Statisitcs.
• [9] Bousquet, O., O. Chapelle, and M. Hein (2003). Measure based regularization. In, Adv. in Neural Inf. Processing Systems.
• [10] Candès, E. and T. Tao (2005). The Dantzig selector: Statistical estimation when p is much larger than n., Ann. Stat..
• [11] Cheng, M. Y. and H. T. Wu (2013). Local linear regression on manifolds and its geometric interpretation., Journal of the American Statistical Association 108, 1421–1434.
• [12] Coifman, R. and S. Lafon (2006a). Diffusion maps., Applied and Computational Harmonic Analysis 21, 5–30.
• [13] Coifman, R. and S. Lafon (2006b). Geometric harmonics., Applied and Computational Harmonic Analysis 21, 31–52.
• [14] Coifman, R., S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker (2005). Geometric diffusions as a tool for harmonics analysis and structure definition of data: Diffusion maps., Proc. of the National Academy of Sciences 102(21), 7426–7431.
• [15] Cucker, F. and D. Zhou (2007)., Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press.
• [16] Donoho, D. and C. Grimes (2003, May). Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data., Proc. of the National Academy of Sciences 100(10), 5591–5596.
• [17] Efromovich, S. (1999)., Nonparametric Curve Estimation: Methods, Theory and Application. Springer.
• [18] Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies., Annals of Statistics 21, 196–216.
• [19] Giné, E. and A. Guillou (2002). Rates of strong uniform consistency for multivariate kernel density estimators., Ann Inst. H. Poincaré 38, 907–921.
• [20] Giné, E. and V. Koltchinskii (2006). Empirical graph Laplacian approximation of Laplace-Beltrami operators: Large sample results. In, High Dimensional Probability: Proceedings of the Fourth International Conference, IMS Lecture Notes, pp. 1–22.
• [21] Girosi, F., M. Jones, and T. Poggio (1995). Regularization theory and neural network architectures., Neural Computation 7, 219–269.
• [22] Grigor’yan, A. (2006). Heat kernels on weighted manifolds and applications., Cont. Math. 398, 93–191.
• [23] Halko, N., P. G. Martinsson, and J. A. Tropp (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions., SIAM Review 53(2), 217–288.
• [24] Hastie, T., R. Tibshirani, and J. Friedman (2009)., The Elements of Statistical Learning (Second ed.). Springer.
• [25] Hein, M., J.-Y. Audibert, and U. von Luxburg (2005). Intrinsic dimensionality estimation of submanifolds in $R^d$. In, Proc. of the 22nd Int’l Conf on Machine learning.
• [26] Henry, G. and D. Rodriguez (2009). Kernel density estimation on riemannian manifolds: Asymptotic results., Journal of Mathematical Imaging and Vision 34(3), 235–239.
• [27] Izbicki, R., A. Lee, and C. Schafer (2014). High-dimensional density ratio estimation with extensions to approximate likelihood computation., Journal of Machine Learning Research (AISTATS Track), 420–429.
• [28] Izbicki, R. and A. Lee (2016). Conditional density estimation in a high-dimensional regression setting., Journal of Computational and Graphical Statistics, to appear.
• [29] Izbicki, R. and A. Lee (2016). Supplement to “A spectral series approach to high-dimensional nonparametric regression.” DOI:, 10.1214/16-EJS1112SUPP.
• [30] Jolliffe, I. T. (2002)., Principal Component Analysis. Springer.
• [31] Kpotufe, S. (2011). k-NN regression adapts to local intrinsic dimension. In, Advances in Neural Information Processing Systems 24, pp. 729–737. The MIT Press.
• [32] Lafferty, J. and L. Wasserman (2008). Rodeo: Sparse, greedy nonparametric regression., Annals of Statistics 36(1), 28–63.
• [33] Lafon, S. and A. Lee (2006). Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization., IEEE Trans. Pattern Anal. and Mach. Intel. 28, 1393–1403.
• [34] Lee, A. and L. Wasserman (2010). Spectral connectivity analysis., Journal of the American Statistical Association 105(491), 1241–1255.
• [35] Mallat, S. (2009)., A Wavelet Tour of Signal Processing (3rd ed.). Academic Press.
• [36] Meila, M. and J. Shi (2001). A random walks view on spectral segmentation. In, Proc. Eighth International Conference on Artificial Intelligence and Statistics.
• [37] Minh, H. Q., P. Niyogi, and Y. Yao (2006). Mercer’s theorem, feature maps, and smoothing. In, Learning Theory, 19th Annual Conference on Learning Theory.
• [38] Nadler, B., A. Srebro, and X. Zhou (2009). Semi-supervised learning with the graph laplacian: The limit of infinite unlabelled, data.
• [39] Olshausen, B. A. and D. J. Field (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images., Nature 381(6583), 607–609.
• [40] Ozakin, A. and A. Gray (2009). Submanifold density estimation. In, Adv. in Neural Inf. Processing Systems, pp. 1375–1382.
• [41] Ravikumar, P., J. Lafferty, H. Liu, and L. Wasserman (2009). Sparse additive models., Journal of the Royal Statistical Society, Series B 71(5), 1009–1030.
• [42] Richards, J. W., P. E. Freeman, A. B. Lee, and C. M. Schafer (2009). Exploiting low-dimensional structure in astronomical spectra., Astrophysical Journal 691, 32–42.
• [43] Rosasco, L., M. Belkin, and E. D. Vito (2008). A note on perturbation results for learning empirical operators. CSAIL Technical Report TR-2008-052, CBCL-274, Massachusetts Institute of, Technology.
• [44] Rosasco, L., M. Belkin, and E. D. Vito (2010). On learning with integral operators., Journal of Machine Learning Research 11, 905–934.
• [45] Saerens, M., F. Fouss, L. Yen, and P. Dupont (2004). The principal components analysis of a graph, and its relationships to spectral clustering. In, Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence, pp. 371–383. Springer-Verlag.
• [46] Safarov, Y. and D. Vassiliev (1996)., The Asymptotic Distribution of Eigenvalues of Partial Differential Operators, Volume 155 of Translations of Mathematical Monographs. American Mathematical Society.
• [47] Schoenberg, I. J. (1938). Metric spaces and completely monotone functions., Annals of Mathematics 39(4), 811–841.
• [48] Schölkopf, B., A. Smola, and K. R. Müller (1997). Kernel principal component analysis. In, Artificial Neural Networks – ICANN’97, pp. 583–588. Springer.
• [49] Schölkopf, B. and A. J. Smola (2001)., Learning with Kernels. MIT Press.
• [50] Shi, T., M. Belkin, and B. Yu (2009). Data spectroscopy: Eigenspaces of convolution operators and clustering., The Annals of Statistics 37(6B), 3960–3984.
• [51] Singer, A. (2006). From graph to manifold Laplacian: The convergence rate., Applied and Computational Harmonic Analysis 21, 128–134.
• [52] Slepian, D. (1983). Some comments on fourier analysis, uncertainty and modeling., SIAM Review 25(3), 379–393.
• [53] Steinwart, I. and A. Christmann (2008)., Support Vector Machines. Springer.
• [54] Tenenbaum, J. B., V. de Silva, and J. C. Langford (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction., Science 290(5500), 2319–2323.
• [55] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society, Series B, Methodological 58, 267–288.
• [56] Vapnik, V. (1996)., Statistical Learning Theory. Wiley.
• [57] von Luxburg, U. (2007). A tutorial on spectral clustering., Statistics and Computing 17(4), 395–416.
• [58] Wahba, G. (1990)., Spline Models for Observational Data. SIAM.
• [59] Wold, S., M. Sjöström, and L. Eriksson (2001). PLS-regression: A basic tool of chemometrics., Chemometrics and Intelligent Laboratory Systems 58, 109–130.
• [60] Wu, Q., Y. Ying, and D.-X. Zhou (2007). Multi-kernel regularized classifiers., Journal of Complexity 23, 108–134.
• [61] Yuan, M. and Y. Lin (2006). Model selection and estimation in regression with grouped variables., Journal of the Royal Statistical Society, Series B 68(1), 49–67.
• [62] Zhou, X. and N. Srebro (2011). Error analysis of Laplacian eigenmaps for semi-supervised learning. In, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Volume 15, pp. 892–900.
• [63] Zhu, X., Z. Ghahramani, and J. Lafferty (2003). Semi-supervised learning using gaussian fields and harmonic functions. In, ICML-03, 20th International Conference on Machine Learning.
• [64] Zhu, X. and A. B. Goldberg (2009). Introduction to semi-supervised learning., Synthesis Lectures on Artificial Intelligence and Machine Learning 3(1), 1–130.
• [65] Zou, H. and T. Hastie (2005). Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society, Series B 67(2), 301–320.