Annals of Statistics

Stochastic expansions using continuous dictionaries: Lévy adaptive regression kernels

Robert L. Wolpert, Merlise A. Clyde, and Chong Tu

Full-text: Open access

Abstract

This article describes a new class of prior distributions for nonparametric function estimation. The unknown function is modeled as a limit of weighted sums of kernels or generator functions indexed by continuous parameters that control local and global features such as their translation, dilation, modulation and shape. Lévy random fields and their stochastic integrals are employed to induce prior distributions for the unknown functions or, equivalently, for the number of kernels and for the parameters governing their features. Scaling, shape, and other features of the generating functions are location-specific to allow quite different function properties in different parts of the space, as with wavelet bases and other methods employing overcomplete dictionaries. We provide conditions under which the stochastic expansions converge in specified Besov or Sobolev norms. Under a Gaussian error model, this may be viewed as a sparse regression problem, with regularization induced via the Lévy random field prior distribution. Posterior inference for the unknown functions is based on a reversible jump Markov chain Monte Carlo algorithm. We compare the Lévy Adaptive Regression Kernel (LARK) method to wavelet-based methods using some of the standard test functions, and illustrate its flexibility and adaptability in nonstationary applications.

Article information

Source
Ann. Statist., Volume 39, Number 4 (2011), 1916-1962.

Dates
First available in Project Euclid: 24 August 2011

Permanent link to this document
https://projecteuclid.org/euclid.aos/1314190619

Digital Object Identifier
doi:10.1214/11-AOS889

Mathematical Reviews number (MathSciNet)
MR2893857

Zentralblatt MATH identifier
1227.62030

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 60E07: Infinitely divisible distributions; stable distributions

Keywords
Bayes Besov kernel regression LARK Lévy random field nonparametric regression relevance vector machine reversible jump Markov chain Monte Carlo splines support vector machine wavelets

Citation

Wolpert, Robert L.; Clyde, Merlise A.; Tu, Chong. Stochastic expansions using continuous dictionaries: Lévy adaptive regression kernels. Ann. Statist. 39 (2011), no. 4, 1916--1962. doi:10.1214/11-AOS889. https://projecteuclid.org/euclid.aos/1314190619


Export citation

References

  • Abramovich, F., Sapatinas, T. and Silverman, B. W. (1998). Wavelet thresholding via a Bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 725–749.
  • Abramovich, F., Sapatinas, T. and Silverman, B. W. (2000). Stochastic expansions in an overcomplete wavelet dictionary. Probab. Theory Related Fields 117 133–144.
  • Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series 55. U.S. Government Printing Office, Washington, DC.
  • Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2 1152–1174.
  • Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • Chilès, J.-P. and Delfiner, P. (1999). Geostatistics: Modeling Spatial Uncertainty. Wiley, New York.
  • Choi, T. and Schervish, M. J. (2007). On posterior consistency in nonparametric regression problems. J. Multivariate Anal. 98 1969–1987.
  • Choudhuri, N., Ghosal, S. and Roy, A. (2004). Bayesian estimation of the spectral density of a time series. J. Amer. Statist. Assoc. 99 1050–1059.
  • Chu, J.-H., Clyde, M. A. and Liang, F. (2009). Bayesian function estimation using continuous wavelet dictionaries. Statist. Sinica 19 1419–1438.
  • Chu, C.-K. and Marron, J. S. (1991). Choosing a kernel regression estimator (with discussion). Statist. Sci. 6 404–436.
  • Clyde, M. A. and Wolpert, R. L. (2007). Nonparametric function estimation using overcomplete dictionaries. In Bayesian Statistics 8 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 91–114. Oxford Univ. Press, Oxford.
  • Cont, R. and Tankov, P. (2004). Financial Modelling with Jump Processes. Chapman & Hall/CRC, Boca Raton, FL.
  • Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge Univ. Press, Cambridge.
  • Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets. Comm. Pure Appl. Math. 41 909–996.
  • Daubechies, I. (1992). Ten Lectures on Wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics 61. SIAM, Philadelphia, PA.
  • Denison, D. G. T., Mallick, B. K. and Smith, A. F. M. (1998). Automatic Bayesian curve fitting. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 333–350.
  • Denison, D. G. T., Holmes, C. C., Mallick, B. K. and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression. Wiley, Chichester.
  • DiMatteo, I., Genovese, C. R. and Kass, R. E. (2001). Bayesian curve-fitting with free-knot splines. Biometrika 88 1055–1071.
  • Donoho, D. L. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202 (electronic).
  • Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Escobar, M. D. (1994). Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc. 89 268–277.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2 615–629.
  • Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192–223.
  • Gilks, W. R., Richardson, S. and Spiegelhalter, D. J., eds. (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall, London.
  • Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • Higdon, D. M. (1998). A process-convolution approach to modeling temperatures in the North Atlantic ocean. Environ. Ecol. Stat. 5 173–190.
  • Higdon, D., Swall, J. and Kern, J. (1999). Non-stationary spatial modeling. In Bayesian Statistics 6 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 761–768. Oxford Univ. Press, Oxford.
  • Jacod, J. and Shiryaev, A. N. (1987). Limit Theorems for Stochastic Processes. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 288. Springer, Berlin.
  • Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
  • Johnstone, I. M. and Silverman, B. W. (2005a). EBayesThresh: R programs for empirical Bayes thresholding. Journal of Statistical Software 12 1–38.
  • Johnstone, I. M. and Silverman, B. W. (2005b). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700–1752.
  • Jordan, M. I. (2010). Hierarchical models, nested models and completely random measures. In Frontiers of Statistical Decision Making and Bayesian Analysis: In Honor of James O. Berger (M.-H. Chen, D. K. Dey, P. Müller, D. Sun and K. Ye, eds.) 207–217. Springer, New York.
  • Khinchine, A. Y. and Lévy, P. (1936). Sur les lois stables. C. R. Math. Acad. Sci. Paris 202 374–376.
  • Kingman, J. F. C. (1967). Completely random measures. Pacific J. Math. 21 59–78.
  • Kwapień, S. and Woyczyński, W. A. (1992). Random Series and Stochastic Integrals: Single and Multiple. Birkhäuser, Boston, MA.
  • Law, M. H. and Kwok, J. T. (2001). Bayesian support vector regression. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (AISTATS) 239–244. Key West, FL.
  • Liang, F., Mukherjee, S. and West, M. (2007). The use of unlabeled data in predictive modeling. Statist. Sci. 22 189–205.
  • MacEachern, S. N. (1994). Estimating normal means with a conjugate style Dirichlet process prior. Comm. Statist. Simulation Comput. 23 727–741.
  • MacEachern, S. N. (1998). Computational methods for mixture of Dirichlet process models. In Practical Nonparametric and Semiparametric Bayesian Statistics (D. K. Dey, P. Müller and D. Sinha, eds.). Lecture Notes in Statist. 133 23–43. Springer, New York.
  • Mallat, S. G. and Zhang, Z. (1993). Matching pursuit with time-frequency dictionaries. IEEE Trans. Signal Process 41 3397–3415.
  • Müller, P. and Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Statist. Sci. 19 95–110.
  • Nikol’skiĭ, S. M. (1975). Approximation of Functions of Several Variables and Imbedding Theorems. Die Grundlehren der Mathematischen Wissenschaften 205 Springer, New York. Translated from the Russian by John M. Danskin, Jr.
  • Pillai, N. S. (2008). Lévy random measures: Posterior consistency and applications. Ph.D. dissertation, Dept. Statist. Sci., Duke Univ. Available at http://stat.duke.edu/people/theses/PillaiNS.html.
  • Pillai, N. S., Wu, Q., Liang, F., Mukherjee, S. and Wolpert, R. L. (2007). Characterizing the function space for Bayesian kernel models. J. Mach. Learn. Res. 8 1769–1797 (electronic).
  • R Development Core Team (2004). R: A language and environment for statistical computing. R foundation for statistical computing. Available at http://www.R-project.org.
  • Rajput, B. S. and Rosiński, J. (1989). Spectral representations of infinitely divisible processes. Probab. Theory Related Fields 82 451–487.
  • Reed, M. C. and Simon, B. (1975). Methods of Modern Mathematical Physics, Vol. II: Fourier Analysis, Self-Adjointness. Academic Press, New York.
  • Sato, K.-i. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge Studies in Advanced Mathematics 68. Cambridge Univ. Press, Cambridge. Translated from the 1990 Japanese original. Revised by the author.
  • Schmidt, G., Mattern, R. and Schüler, F. (1981). Biomechanical investigation to determine physical and traumatological differentiation criteria for the maximum load capacity of head and vertebral column with and without protective helmet under the effects of impact. EEC research program on biomechanics of impacts, final report, phase III, Project 65, Institut für Rechtsmedizin, Univ. Heidelberg, Germany.
  • Silverman, B. W. (1985). Some aspects of the spline smoothing approach to nonparametric regression curve fitting. J. R. Stat. Soc. Ser. B Stat. Methodol. 47 1–52.
  • Sisson, S. A. (2005). Transdimensional Markov chains: A decade of progress and future perspectives. J. Amer. Statist. Assoc. 100 1077–1089.
  • Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317–343.
  • Sobolev, S. L. (1991). Some Applications of Functional Analysis in Mathematical Physics. Translations of Mathematical Monographs 90. Amer. Math. Soc., Providence, RI.
  • Sollich, P. (2002). Bayesian methods for support vector machines: Evidence and predictive class probabilities. Machine Learning 46 21–52.
  • Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1 211–244.
  • Triebel, H. (1992). Theory of Function Spaces. II. Monographs in Mathematics 84. Birkhäuser, Basel.
  • Tu, C. (2006). Nonparametric modelling using Lévy process priors with applications for function estimation, time series modeling and spatio-temporal modeling. Ph.D. dissertation, Dept. Statist. Sci., Duke Univ. Available at http://www.stat.duke.edu/people/theses/TuC.html.
  • U.S. EPA. (2007). Air Quality System (AQS). Available at http://www.epa.gov/ttn/airs/airsaqs/.
  • Vidakovic, B. (1999). Statistical Modeling by Wavelets. Wiley, New York.
  • Wahba, G. (1992). Multivariate function and operator estimation, based on smoothing splines and reproducing kernels. In Nonlinear Modeling and Forecasting: Proceedings of the Workshop on Nonlinear Modeling and Forecasting held September, 1990, in Santa Fe, New Mexico (M. Casdagli and S. G. Eubank, eds.). SFI Studies in the Sciences of Complexity XII 95–112. Addison-Wesley, Redwood, CA.
  • West, M. (2003). Bayesian factor regression models in the “large p, small n” paradigm. In Bayesian Statistics 7 (J. M. Bernardo et al., eds.) 733–742. Oxford Univ. Press, New York.
  • Wolfe, P. J., Godsill, S. J. and Ng, W.-J. (2004). Bayesian variable selection and regularization for time-frequency surface estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 575–589.
  • Wolpert, R. L., Ickstadt, K. and Hansen, M. B. (2003). A nonparametric Bayesian approach to inverse problems. In Bayesian Statistics 7 (J. M. Bernardo et al., eds.) 403–417. Oxford Univ. Press, New York.
  • Wolpert, R. L. and Taqqu, M. S. (2005). Fractional Ornstein–Uhlenbeck Lévy processes and the Telecom process: Upstairs and downstairs. Signal Processing 85 1523–1545.
  • Zolotarev, V. M. (1986). One-dimensional Stable Distributions. Translations of Mathematical Monographs 65. Amer. Math. Soc., Providence, RI.