Bernoulli

  • Bernoulli
  • Volume 25, Number 4A (2019), 2982-3015.

Uniform rates of the Glivenko–Cantelli convergence and their use in approximating Bayesian inferences

Emanuele Dolera and Eugenio Regazzini

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

This paper deals with suitable quantifications in approximating a probability measure by an “empirical” random probability measure $\hat{\mathfrak{p}}_{n}$, depending on the first $n$ terms of a sequence $\{\tilde{\xi}_{i}\}_{i\geq1}$ of random elements. Section 2 studies the range of oscillation near zero of the Wasserstein distance $\mathrm{d}^{(p)}_{[\mathbb{S}]}$ between $\mathfrak{p}_{0}$ and $\hat{\mathfrak{p}}_{n}$, assuming the $\tilde{\xi}_{i}$’s i.i.d. from $\mathfrak{p}_{0}$. In Theorem 2.1 $\mathfrak{p}_{0}$ can be fixed in the space of all probability measures on $(\mathbb{R}^{d},\mathscr{B}(\mathbb{R}^{d}))$ and $\hat{\mathfrak{p}}_{n}$ coincides with the empirical measure $\tilde{\mathfrak{e}}_{n}:=\frac{1}{n}\sum_{i=1}^{n}\delta_{\tilde{\xi}_{i}}$. In Theorem 2.2 (Theorem 2.3, respectively), $\mathfrak{p}_{0}$ is a $d$-dimensional Gaussian distribution (an element of a distinguished statistical exponential family, respectively) and $\hat{\mathfrak{p}}_{n}$ is another $d$-dimensional Gaussian distribution with estimated mean and covariance matrix (another element of the same family with an estimated parameter, respectively). These new results improve on allied recent works by providing also uniform bounds with respect to $n$, meaning the finiteness of the $p$-moment of $\mathop{\mathrm{sup}}_{n\geq1}b_{n}\mathrm{d}^{(p)}_{[\mathbb{S}]}(\mathfrak{p}_{0},\hat{\mathfrak{p}}_{n})$ is proved for some diverging sequence $b_{n}$ of positive numbers. In Section 3, assuming the $\tilde{\xi}_{i}$’s exchangeable, one studies the range of oscillation near zero of the Wasserstein distance between the conditional distribution – also called posterior – of the directing measure of the sequence, given $\tilde{\xi}_{1},\ldots,\tilde{\xi}_{n}$, and the point mass at $\hat{\mathfrak{p}}_{n}$. Similarly, a bound for the approximation of predictive distributions is given. Finally, Theorems from 3.3 to 3.5 reconsider Theorems from 2.1 to 2.3, respectively, according to a Bayesian perspective.

Article information

Source
Bernoulli, Volume 25, Number 4A (2019), 2982-3015.

Dates
Received: December 2017
Revised: July 2018
First available in Project Euclid: 13 September 2019

Permanent link to this document
https://projecteuclid.org/euclid.bj/1568362049

Digital Object Identifier
doi:10.3150/18-BEJ1077

Mathematical Reviews number (MathSciNet)
MR4003571

Zentralblatt MATH identifier
07110118

Keywords
dominated ergodic theorem empirical measure exchangeability Glivenko–Cantelli theorem law of the iterated logarithm posterior distribution predictive distribution Wasserstein distance

Citation

Dolera, Emanuele; Regazzini, Eugenio. Uniform rates of the Glivenko–Cantelli convergence and their use in approximating Bayesian inferences. Bernoulli 25 (2019), no. 4A, 2982--3015. doi:10.3150/18-BEJ1077. https://projecteuclid.org/euclid.bj/1568362049


Export citation

References

  • [1] Ajtai, M., Komlós, J. and Tusnády, G. (1984). On optimal matchings. Combinatorica 4 259–264.
  • [2] Aldous, D.J. (1985). Exchangeability and related topics. In École D’été de Probabilités de Saint-Flour, XIII – 1983. Lecture Notes in Math. 1117 1–198. Berlin: Springer.
  • [3] Ambrosio, L., Gigli, N. and Savaré, G. (2008). Gradient Flows in Metric Spaces and in the Space of Probability Measures, 2nd ed. Lectures in Mathematics ETH Zürich. Basel: Birkhäuser.
  • [4] Ambrosio, L., Stra, F. and Trevisan, D. (2018). A PDE approach to a 2-dimensional matching problem. To appear on Probab. Theory Related Fields. DOI:10.1007/s00440-018-0837-x.
  • [5] Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: Wiley.
  • [6] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley Series in Probability and Mathematical Statistics. Chichester: Wiley.
  • [7] Berti, P., Crimaldi, I., Pratelli, L. and Rigo, P. (2009). Rate of convergence of predictive distributions for dependent data. Bernoulli 15 1351–1367.
  • [8] Berti, P., Pratelli, L. and Rigo, P. (2012). Limit theorems for empirical processes based on dependent data. Electron. J. Probab. 17 no. 9, 18.
  • [9] Berti, P., Pratelli, L. and Rigo, P. (2018). Asymptotic predictive inference with exchangeable data. Braz. J. Probab. Stat. 32 815–833.
  • [10] Blackwell, D. and Dubins, L. (1962). Merging of opinions with increasing information. Ann. Math. Stat. 33 882–886.
  • [11] Bobkov, S. and Ledoux, M. (2016). One-dimensional empirical measures, order statistics, and Kantorovich transport distances. To appear on Mem. Amer. Math. Soc.
  • [12] Boissard, E. (2011). Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Probab. 16 2296–2333.
  • [13] Boissard, E. and Le Gouic, T. (2014). On the mean speed of convergence of empirical and occupation measures in Wasserstein distance. Ann. Inst. Henri Poincaré Probab. Stat. 50 539–563.
  • [14] Bolley, F., Guillin, A. and Villani, C. (2007). Quantitative concentration inequalities for empirical measures on non-compact spaces. Probab. Theory Related Fields 137 541–593.
  • [15] Borovkov, A.A. and Mogul’skiĭ, A.A. (2011). Chebyshev-type exponential inequalities for sums of random vectors and for trajectories of random walks. Teor. Veroyatn. Primen. 56 3–29.
  • [16] Brown, L.D. (1986). Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Institute of Mathematical Statistics Lecture Notes – Monograph Series 9. Hayward, CA: IMS.
  • [17] Cantelli, F.P. (1933). Sulla determinazione empirica delle leggi di probabilità. G. Ist. Ital. Attuari 4 421–424.
  • [18] Caracciolo, S., Lucibello, G. and Parisi, G. (2014). Scaling hypothesis for the Euclidean bipartite matching problem. Phys. Rev. E 90 012118.
  • [19] Chow, Y.S. and Teicher, H. (1997). Probability Theory: Independence, Interchangeability, Martingales, 3rd ed. Springer Texts in Statistics. New York: Springer.
  • [20] Cifarelli, D.M., Dolera, E. and Regazzini, E. (2016). Frequentistic approximations to Bayesian prevision of exchangeable random elements. Internat. J. Approx. Reason. 78 138–152.
  • [21] Cifarelli, D.M., Dolera, E. and Regazzini, E. (2017). Note on “Frequentistic approximations to Bayesian prevision of exchangeable random elements” [Int. J. Approx. Reason. 78 (2016) 138–152] [MR3543878]. Internat. J. Approx. Reason. 86 26–27.
  • [22] Cox, D.R. (1975). Partial likelihood. Biometrika 62 269–276.
  • [23] de Finetti, B. (1930). Funzione caratteristica di un fenomeno aleatorio. Atti R. Accad. Naz. Lincei, Mem. 4 86–133.
  • [24] de Finetti, B. (1933). Sull’approssimazione empirica di una legge di probabilità. G. Ist. Ital. Attuari 4 415–420.
  • [25] de Finetti, B. (1933). La legge dei grandi numeri nel caso dei numeri aleatori equivalenti. Atti R. Accad. Naz. Lincei, Rend. 18 203–207.
  • [26] de Finetti, B. (1937). La prévision: Ses lois logiques, ses sources subjectives. Ann. Inst. Henri Poincaré 7 1–68.
  • [27] Dereich, S., Scheutzow, M. and Schottstedt, R. (2013). Constructive quantization: Approximation by empirical measures. Ann. Inst. Henri Poincaré Probab. Stat. 49 1183–1203.
  • [28] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. New York: Springer.
  • [29] Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist. 14 1–67.
  • [30] Dowson, D.C. and Landau, B.V. (1982). The Fréchet distance between multivariate normal distributions. J. Multivariate Anal. 12 450–455.
  • [31] Dudley, R.M. (1968). The speed of mean Glivenko–Cantelli convergence. Ann. Math. Stat. 40 40–50.
  • [32] Dudley, R.M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge: Cambridge Univ. Press.
  • [33] Dudley, R.M. (2002). Real Analysis and Probability. Cambridge Studies in Advanced Mathematics 74. Cambridge: Cambridge Univ. Press.
  • [34] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
  • [35] Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics (IMS) Monographs 1. Cambridge: Cambridge Univ. Press.
  • [36] Einmahl, U. (2016). Law of the iterated logarithm type results for random vectors with infinite second moments. Math. Appl. (Warsaw) 44 167–181.
  • [37] Fournier, N. and Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162 707–738.
  • [38] Glivenko, V.I. (1933). Sulla determinazione empirica delle leggi di probabilità. G. Ist. Ital. Attuari 4 92–99.
  • [39] Gozlan, N. (2009). A characterization of dimension free concentration in terms of transportation inequalities. Ann. Probab. 37 2480–2498.
  • [40] Gozlan, N. and Léonard, C. (2007). A large deviation approach to some transportation cost inequalities. Probab. Theory Related Fields 139 235–283.
  • [41] Gozlan, N. and Léonard, C. (2010). Transport inequalities. A survey. Markov Process. Related Fields 16 635–736.
  • [42] Horn, R.A. and Johnson, C.R. (2013). Matrix Analysis, 2nd ed. Cambridge: Cambridge Univ. Press.
  • [43] Horowitz, J. and Karandikar, R.L. (1994). Mean rates of convergence of empirical measures in the Wasserstein metric. J. Comput. Appl. Math. 55 261–273.
  • [44] Kallenberg, O. (2002). Foundations of Modern Probability, 2nd ed. Probability and Its Applications (New York). New York: Springer.
  • [45] Kolmogorov, A.N. (1933). Sulla determinazione empirica di una legge di distribuzione. G. Ist. Ital. Attuari 4 83–91.
  • [46] Maritz, J.S. and Lwin, T. (1989). Empirical Bayes Methods, 2nd ed. Monographs on Statistics and Applied Probability 35. London: CRC Press.
  • [47] Massart, P. (1988). About the Prohorov [Prokhorov] distance between the uniform distribution over the unit cube in ${\mathbf{R}}^{d}$ and its empirical measure. Probab. Theory Related Fields 79 431–450.
  • [48] Morris, C.N. (1983). Parametric empirical Bayes inference: Theory and applications. J. Amer. Statist. Assoc. 78 47–65.
  • [49] Olkin, I. and Pukelsheim, F. (1982). The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48 257–263.
  • [50] Petrone, S., Rizzelli, S., Rousseau, J. and Scricciolo, C. (2014). Empirical Bayes methods in classical and Bayesian inference. Metron 72 201–215.
  • [51] Petrone, S., Rousseau, J. and Scricciolo, C. (2014). Bayes and empirical Bayes: Do they merge? Biometrika 101 285–302.
  • [52] Robbins, H. (1956). An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. I 157–163. Berkeley and Los Angeles: Univ. California Press.
  • [53] Rockafellar, R.T. (1970). Convex Analysis. Princeton Mathematical Series 28. Princeton, NJ: Princeton Univ. Press.
  • [54] Rousseau, J. (2016). On the frequentist properties of Bayesian nonparametric methods. Annu. Rev. Sci. Stat. 3 211–231.
  • [55] Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford Statistical Science Series 22. Oxford: Oxford Univ. Press.
  • [56] Shor, P.W. and Yukich, J.E. (1991). Minimax grid matching and empirical measures. Ann. Probab. 19 1338–1348.
  • [57] Shorack, G.R. and Wellner, J.A. (1986). Empirical Processes with Applications to Statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: Wiley.
  • [58] Siegmund, D. (1969). On moments of the maximum of normed partial sums. Ann. Math. Stat. 40 527–531.
  • [59] Talagrand, M. (1994). The transportation cost from the uniform measure to the empirical measure in dimension $\ge3$. Ann. Probab. 22 919–959.
  • [60] Talagrand, M. (1996). Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6 587–600.
  • [61] Teicher, H. (1971). Completion of a dominated ergodic theorem. Ann. Math. Stat. 42 2156–2158.
  • [62] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. New York: Springer.
  • [63] Vapnik, V.N. and Červonenkis, A.J. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264–280.
  • [64] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 338. Berlin: Springer.
  • [65] Weed, J. and Bach, F. (2017). Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Preprint. Available at arXiv:1707.00087.
  • [66] Yosida, K. and Kakutani, S. (1939). Birkhoff’s ergodic theorem and the maximal ergodic theorem. Proc. Imp. Acad. (Tokyo) 15 165–168.
  • [67] Yukich, J.E. (1989). Optimal matching and empirical measures. Proc. Amer. Math. Soc. 107 1051–1059.
  • [68] Yurinskiĭ, V.V. (1976). Exponential inequalities for sums of random vectors. J. Multivariate Anal. 6 473–499.