## Electronic Journal of Statistics

### Upper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real line

#### Abstract

This paper is focused on the statistical analysis of probability measures $\boldsymbol{\nu }_{1},\ldots ,\boldsymbol{\nu }_{n}$ on ${\mathbb{R}}$ that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures $\boldsymbol{\nu }_{i}$ are absolutely continuous with densities $\boldsymbol{f}_{i}$ that are not directly observable. In this case, instead of the densities, we have access to datasets of real random variables $(X_{i,j})_{1\leq i\leq n;\;1\leq j\leq p_{i}}$ organized in the form of $n$ experimental units, such that $X_{i,1},\ldots ,X_{i,p_{i}}$ are iid observations sampled from a random measure $\boldsymbol{\nu }_{i}$ for each $1\leq i\leq n$. In this setting, we focus on first-order statistics methods for estimating, from such data, a meaningful structural mean measure. For the purpose of taking into account phase and amplitude variations in the observations, we argue that the notion of Wasserstein barycenter is a relevant tool. The main contribution of this paper is to characterize the rate of convergence of a (possibly smoothed) empirical Wasserstein barycenter towards its population counterpart in the asymptotic setting where both $n$ and $\min_{1\leq i\leq n}p_{i}$ may go to infinity. The optimality of this procedure is discussed from the minimax point of view with respect to the Wasserstein metric. We also highlight the connection between our approach and the curve registration problem in statistics. Some numerical experiments are used to illustrate the results of the paper on the convergence rate of empirical Wasserstein barycenters.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 2253-2289.

Dates
First available in Project Euclid: 23 July 2018

https://projecteuclid.org/euclid.ejs/1532333005

Digital Object Identifier
doi:10.1214/18-EJS1400

Mathematical Reviews number (MathSciNet)
MR3830834

Zentralblatt MATH identifier
06917476

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62G20: Asymptotic properties

#### Citation

Bigot, Jérémie; Gouet, Raúl; Klein, Thierry; López, Alfredo. Upper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real line. Electron. J. Statist. 12 (2018), no. 2, 2253--2289. doi:10.1214/18-EJS1400. https://projecteuclid.org/euclid.ejs/1532333005

#### References

• [AC11] M. Agueh and G. Carlier. Barycenters in the Wasserstein space., SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011.
• [BGKL15] J. Bigot, R. Gouet, T. Klein, and A. Lopez. Geodesic PCA in the Wasserstein space by Convex PCA., Annales de l’Institut Henri Poincaré B: Probability and Statistics, To be published, 2015.
• [BGL17] J. Bigot, R. Gouet, and A. López. Principal component analysis of probability density functions based on the Wasserstein metric., Preprint, 2017.
• [BIAS03] B. M. Bolstad, R. A. Irizarry, M. Astrand, and T. P. Speed. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias., Bioinformatics, 19(2):185–193, January 2003.
• [BK17] J. Bigot and T. Klein. Characterization of barycenters in the Wasserstein space by averaging optimal transport maps., ESAIM: Probability & Statistics, To be published, 2017.
• [BL17] S. Bobkov and M. Ledoux., One-dimensional empirical measures, order statistics and Kantorovich transport distances. Memoirs of the American Mathematical Society, 2017. Available at https://perso.math.univ-toulouse.fr/ledoux/files/2016/12/MEMO.pdf.
• [BLGL15] E. Boissard, T. Le Gouic, and J.-M. Loubes. Distribution’s template estimate with Wasserstein metrics., Bernoulli, 21(2):740–759, 2015.
• [dBGU05] E. del Barrio, E. Giné, and F. Utzet. Asymptotics for $L_2$ functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances., Bernoulli, 11(1):131–189, 2005.
• [Del11] P. Delicado. Dimensionality reduction when data are density functions., Comput. Statist. Data Anal., 55(1):401–420, 2011.
• [Fré48] M. Fréchet. Les éléments aléatoires de nature quelconque dans un espace distancié., Ann. Inst. H.Poincaré, Sect. B, Prob. et Stat., 10:235–310, 1948.
• [KU01] A. Kneip and K. J. Utikal. Inference for density families using functional principal component analysis., J. Amer. Statist. Assoc., 96(454):519–542, 2001. With comments and a rejoinder by the authors.
• [LH10] Y. Li and T. Hsing. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data., Annals of Statistics, 38(6) :3321–3351, 12 2010.
• [PM15] K. Petersen and H.-G. Müller. Functional data analysis for density functions by transformation to a Hilbert space., Annals of Statistics, To be published, 2015.
• [PZ16] V.M. Panaretos and Y. Zemel. Amplitude and phase variation of point processes., Annals of Statistics, 44(2):771–812, 2016.
• [RL01] J.O. Ramsay and X. Li. Curve registration., Journal of the Royal Statistical Society (B), 63:243–259, 2001.
• [RR93] M. Renardy and R. C. Rogers., An Introduction to Partial Differential Equations, volume 13 of Texts in Applied Mathematics. pub-SV, 1993.
• [Tsy09] A. B. Tsybakov., Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats.
• [Vil03] C. Villani., Topics in Optimal Transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, 2003.
• [WG97] K. Wang and T. Gasser. Alignment of curves by dynamic time warping., Annals of Statistics, 25(3) :1251–1276, 1997.
• [WS11] W. Wu and A. Srivastava. An information-geometric framework for statistical inferences in the neural spike train space., Journal of Computational Neuroscience, 31(3):725–748, November 2011.
• [ZM11] Z. Zhang and H.-G. Müller. Functional density synchronization., Computational Statistics & Data Analysis, 55(7) :2234–2249, 2011.