Bayesian Analysis

Adaptive Bayesian Density Estimation in Lp-metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures

Catia Scricciolo

Full-text: Open access


We consider Bayesian nonparametric density estimation using a Pitman-Yor or a normalized inverse-Gaussian process convolution kernel mixture as the prior distribution for a density. The procedure is studied from a frequentist perspective. Using the stick-breaking representation of the Pitman-Yor process and the finite-dimensional distributions of the normalized inverse-Gaussian process, we prove that, when the data are independent replicates from a density with analytic or Sobolev smoothness, the posterior distribution concentrates on shrinking Lp-norm balls around the sampling density at a minimax-optimal rate, up to a logarithmic factor. The resulting hierarchical Bayesian procedure, with a fixed prior, is adaptive to the unknown smoothness of the sampling density.

Article information

Bayesian Anal., Volume 9, Number 2 (2014), 475-520.

First available in Project Euclid: 26 May 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

adaptation nonparametric density estimation normalized inverse-Gaussian process Pitman-Yor process posterior contraction rate sinc kernel


Scricciolo, Catia. Adaptive Bayesian Density Estimation in $L^{p}$ -metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures. Bayesian Anal. 9 (2014), no. 2, 475--520. doi:10.1214/14-BA863.

Export citation


  • Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards, Applied Mathematics Series, 55. U.S. Government Printing Office, Washington, D.C. Available online at
  • Athreya, K. B. and Lahiri, S. N. (2006). Measure Theory and Probability Theory. New York: Springer.
  • Belitser, E. and Levit, B. (2001). “Asymptotically local minimax estimation of infinitely smooth density with censored data.” Annals of the Institute of Statistical Mathematics, 53: 289–306.
  • Billingsley, P. (1995). Probability and Measure. New York: John Wiley & Sons, 3rd edition.
  • Butucea, C. and Tsybakov, A. B. (2008). “Sharp optimality in density deconvolution with dominating bias. I.” Theory of Probability & Its Applications, 52: 24–39.
  • Carlton, M. A. (2002). “A family of densities derived from the three-parameter Dirichlet process.” Journal of Applied Probability, 39: 764–774.
  • Davis, K. B. (1977). “Mean integrated square error properties of density estimates.” The Annals of Statistics, 5: 530–535.
  • Dedecker, J. and Michel, B. (2013). “Minimax rates of convergence for Wasserstein deconvolution with supersmooth errors in any dimension.” Journal of Multivariate Analysis, 122: 278–291.
  • de Jonge, R. and van Zanten, J. H. (2010). “Adaptive nonparametric Bayesian inference using location-scale mixture priors.” The Annals of Statistics, 38: 3300–3320.
  • Devroye, L. (1992). “A note on the usefulness of superkernels in density estimation.” The Annals of Statistics, 20: 2037–2056.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1996). “Density estimation by wavelet thresholding.” The Annals of Statistics, 24: 508–539.
  • Doss, H. and Sellke, T. (1982). “The tails of probabilities chosen from a Dirichlet prior.” The Annals of Statistics, 10: 1302–1305.
  • Favaro, S., Lijoi, A. and Prünster, I. (2012). “On the stick-breaking representation of normalized inverse Gaussian priors.” Biometrika, 99: 663–674.
  • Ferguson, T. S. (1983). “Bayesian density estimation by mixtures of normal distributions.” In Recent Advances in Statistics, eds. Rizvi, M. H., Rustagi, J. S. and Siegmund, D., New York: Academic Press, pp. 287–302.
  • Ghosal, S. (2001). “Convergence rates for density estimation with Bernstein polynomials.” The Annals of Statistics, 29: 1264–1280.
  • Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). “Posterior consistency of Dirichlet mixtures in density estimation.” The Annals of Statistics, 27: 143–158.
  • Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). “Convergence rates of posterior distributions.” The Annals of Statistics, 28: 500–531.
  • Ghosal, S. and van der Vaart, A. W. (2001). “Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities.” The Annals of Statistics, 29: 1233–1263.
  • Ghosal, S. and van der Vaart, A. (2007a). “Convergence rates of posterior distributions for noniid observations.” The Annals of Statistics, 35: 192–223.
  • — (2007b). “Posterior convergence rates of Dirichlet mixtures at smooth densities.” The Annals of Statistics, 35: 697–723.
  • Giné, E. and Nickl, R. (2011). “Rates of contraction for posterior distributions in $L^{r}$-metrics, $1\leq r\leq\infty$.” The Annals of Statistics, 39: 2883–2911.
  • Golubev, Y. K., Levit, B. Y. and Tsybakov, A. B. (1996). “Asymptotically efficient estimation of analytic functions in Gaussian noise.” Bernoulli, 2: 167–181.
  • Guerre, E. and Tsybakov, A. B. (1998). “Exact asymptotic minimax constants for the estimation of analytical functions in $L_{p}$.” Probability Theory and Related Fields, 112: 33–51.
  • Hasminskii, R. and Ibragimov, I. (1990). “On density estimation in the view of Kolmogorov’s ideas in approximation theory.” The Annals of Statistics, 18: 999–1010.
  • Hurst, S. (1995). “The characteristic function of the Student t distribution.” Financial Mathematics Research Report No. FMRR 006-95, Statistics Research Report No. SRR044-95.
  • Ibragimov, I. A. and Hasminskii, R. Z. (1983). “Estimation of distribution density.” Journal of Soviet Mathematics, 21: 40–57.
  • Ishwaran, H. and James, L. F. (2001). “Gibbs sampling methods for stick-breaking priors.” Journal of the American Statistical Association, 96: 161–173.
  • Ishwaran, H. and Zarepour, M. (2000). “Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models.” Biometrika, 87: 371–390.
  • Kawata, T. (1972). Fourier Analysis in Probability Theory. Probability and Mathematical Statistics, No. 15. New York-London: Academic Press.
  • Kruijer, W., Rousseau, J. and van der Vaart, A. (2010). “Adaptive Bayesian density estimation with location-scale mixtures.” Electronic Journal of Statistics, 4: 1225–1257.
  • Lijoi, A., Mena, R. H. and Prünster, I. (2005). “Hierarchical mixture modeling with normalized inverse-Gaussian priors.” Journal of the American Statistical Association, 100: 1278–1291.
  • Lo, A. Y. (1984). “On a class of Bayesian nonparametric estimates: I. Density estimates.” The Annals of Statistics, 12: 351–357.
  • Maugis-Rabusseau, C. and Michel, B. (2013). “Adaptive density estimation for clustering with Gaussian mixtures.” ESAIM: Probability and Statistics, 17: 698–724.
  • Nguyen, X. (2013). “Convergence of latent mixing measures in finite and infinite mixture models.” The Annals of Statistics, 41: 370–400.
  • Norets, A. and Pelenis, J. (2014). “Posterior consistency in conditional density estimation by covariate dependent mixtures.” Forthcoming in Econometric Theory.
  • Pitman, J. and Yor, M. (1997). “The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator.” The Annals of Probability, 25: 855–900.
  • Scricciolo, C. (2011). “Posterior rates of convergence for Dirichlet mixtures of exponential power densities.” Electronic Journal of Statistics, 5: 270–308.
  • Shen, W., Tokdar, S. T. and Ghosal, S. (2013). “Adaptive Bayesian multivariate density estimation with Dirichlet mixtures.” Biometrika, 100: 623–640.
  • Titchmarsh, E. C. (1937). Introduction to the Theory of Fourier Integrals. Oxford: Clarendon Press.
  • van der Vaart, A. W. and van Zanten, J. H. (2009). “Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth.” The Annals of Statistics, 37: 2655–2675.
  • Villani, C. (2008). Optimal Transport: Old and New. Springer-Verlag Berlin Heidelberg.
  • Watson, G. S. and Leadbetter, M. R. (1963). “On the estimation of the probability density, I.” The Annals of Mathematical Statistics, 34: 480–491.
  • Wong, W. H. and Shen, X. (1995). “Probability inequalities for likelihood ratios and convergence rates of sieve MLEs.” The Annals of Statistics, 23: 339–362.