Bernoulli

  • Bernoulli
  • Volume 24, Number 4A (2018), 3147-3179.

Wide consensus aggregation in the Wasserstein space. Application to location-scatter families

Pedro C. Álvarez-Esteban, Eustasio del Barrio, Juan A. Cuesta-Albertos, and Carlos Matrán

Full-text: Open access

Abstract

We introduce a general theory for a consensus-based combination of estimations of probability measures. Potential applications include parallelized or distributed sampling schemes as well as variations on aggregation from resampling techniques like boosting or bagging. Taking into account the possibility of very discrepant estimations, instead of a full consensus we consider a “wide consensus” procedure. The approach is based on the consideration of trimmed barycenters in the Wasserstein space of probability measures. We provide general existence and consistency results as well as suitable properties of these robustified Fréchet means. In order to get quick applicability, we also include characterizations of barycenters of probabilities that belong to (non necessarily elliptical) location and scatter families. For these families, we provide an iterative algorithm for the effective computation of trimmed barycenters, based on a consistent algorithm for computing barycenters, guarantying applicability in a wide setting of statistical problems.

Article information

Source
Bernoulli, Volume 24, Number 4A (2018), 3147-3179.

Dates
Received: October 2016
Revised: May 2017
First available in Project Euclid: 26 March 2018

Permanent link to this document
https://projecteuclid.org/euclid.bj/1522051236

Digital Object Identifier
doi:10.3150/17-BEJ957

Mathematical Reviews number (MathSciNet)
MR3779713

Zentralblatt MATH identifier
06853276

Keywords
impartial trimming parallelized inference robust aggregation trimmed barycenter trimmed distributions Wasserstein distance wide consensus

Citation

Álvarez-Esteban, Pedro C.; del Barrio, Eustasio; Cuesta-Albertos, Juan A.; Matrán, Carlos. Wide consensus aggregation in the Wasserstein space. Application to location-scatter families. Bernoulli 24 (2018), no. 4A, 3147--3179. doi:10.3150/17-BEJ957. https://projecteuclid.org/euclid.bj/1522051236


Export citation

References

  • [1] Agueh, M. and Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43 904–924.
  • [2] Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J.A. and Matrán, C. (2011). Uniqueness and approximate computation of optimal incomplete transportation plans. Ann. Inst. Henri Poincaré B, Probab. Stat. 47 358–375.
  • [3] Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J.A. and Matrán, C. (2012). Similarity of samples and trimming. Bernoulli 18 606–634.
  • [4] Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J.A. and Matrán, C. (2016). A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appl. 441 744–762.
  • [5] Arsigny, V., Fillard, P., Pennec, X. and Ayache, N. (2006/2007). Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29 328–347.
  • [6] Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L. and Peyré, G. (2015). Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37 A1111–A1138.
  • [7] Bigot, J. and Klein, T. (2015). Consistent estimation of a population barycenter in the Wasserstein space. Preprint. Available at arXiv:1212.2562v5.
  • [8] Boissard, E., Le Gouic, T. and Loubes, J.-M. (2015). Distribution’s template estimate with Wasserstein metrics. Bernoulli 21 740–759.
  • [9] Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123–140.
  • [10] Brenier, Y. (1987). Polar decomposition and increasing rearrangement of vector fields. C. R. Acad. Sci. Paris Ser. I Math. 305 805–808.
  • [11] Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. 44 375–417.
  • [12] Bühlmann, P. (2003). Bagging, subagging and bragging for improving some prediction algorithms. In Recent Advances and Trends in Nonparametric Statistics (M.G. Akritas and D.N. Politis, eds.) 19–34. Amsterdam: Elsevier.
  • [13] Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927–961.
  • [14] Carlier, G., Oberman, A. and Oudet, E. (2015). Numerical methods for matching for teams and Wasserstein barycenters. ESAIM Math. Model. Numer. Anal. 49 1621–1642.
  • [15] Chernozhukov, V., Galichon, A., Hallin, M. and Henry, M. (2017). Monge-Kantorovich depth, quantiles, ranks and signs. Ann. Statist. 45 223–256.
  • [16] Croux, C. and Haesbroeck, G. (1997). An easy way to increase the finite-sample efficiency of the resampled minimum volume ellipsoid estimator. Comput. Statist. Data Anal. 25 125–141.
  • [17] Cuesta-Albertos, J.A. and Matrán, C. (1988). The strong law of large numbers for $k$-means and best possible nets of Banach valued random variables. Probab. Theory Related Fields 78 523–534.
  • [18] Cuesta, J.A. and Matrán, C. (1989). Notes on the Wasserstein metric in Hilbert spaces. Ann. Probab. 17 1264–1276.
  • [19] Cuesta-Albertos, J.A., Matrán, C. and Mayo-Íscar, A. (2008). Trimming and likelihood: Robust location and dispersion estimation in the elliptical model. Ann. Statist. 36 2284–2318.
  • [20] Cuesta-Albertos, J.A., Matrán-Bea, C. and Tuero-Díaz, A. (1996). On lower bounds for the $L^{2}$-Wasserstein metric in a Hilbert space. J. Theoret. Probab. 9 263–283.
  • [21] Cuesta-Albertos, J.A., Matrán Bea, C. and Rodríguez Rodríguez, J.M. (2002). Shape of a distribution through the $L_{2}$-Wasserstein distance. In Distributions with Given Marginals and Statistical Modelling (C.M. Cuadras, J. Fortiana and J.A. Rodríguez-Lallena, eds.) 51–61. Dordrecht: Kluwer Academic.
  • [22] Cuesta-Albertos, J.A., Rüschendorf, L. and Tuero-Díaz, A. (1993). Optimal coupling of multivariate distributions and stochastic processes. J. Multivariate Anal. 46 335–361.
  • [23] Cuturi, M. and Doucet, A. (2014). Fast computation of Wasserstein barycenters. In Proceedings of the 31st International Conference on Machine Learning. JMLR: W&CP vol. 32.
  • [24] del Barrio, E., Cuesta-Albertos, J.A., Matrán, C. and Mayo-Íscar, A. (2016). Robust clustering tools based on optimal transportation. Preprint. Available at arXiv:1607.01179.
  • [25] Dudley, R.M. (1989). Real Analysis and Probability. Pacific Grove, CA: Wadsworth & Brooks.
  • [26] Fritz, H., García-Escudero, L.A. and Mayo-Íscar, A. (2012). tclust: An R package for a trimming approach to cluster analysis. J. Stat. Softw. 47 1–26.
  • [27] García-Escudero, L.A., Gordaliza, A. and Matrán, C. (1999). A central limit theorem for multivariate generalized trimmed $k$-means. Ann. Statist. 27 1061–1079.
  • [28] Gelbrich, M. (1990). On a formula for the $L^{2}$ Wasserstein metric between measures on Euclidean and Hilbert spaces. Math. Nachr. 147 185–203.
  • [29] Gordaliza, A. (1991). Best approximations to random variables based on trimming procedures. J. Approx. Theory 64 162–180.
  • [30] Knott, M. and Smith, C.S. (1994). On a generalization of cyclic monotonicity and distances among random vectors. Linear Algebra Appl. 199 363–371.
  • [31] Le Gouic, T. and Loubes, J.-M. (2015). Barycenter in Wasserstein spaces: Existence and consistency. Probab. Theory Related Fields. To appear. Available at hal-01163262v2.
  • [32] Meinshausen, N. and Bühlmann, P. (2014). Magging: maximin aggregation for inhomogeneous large-scale data. Available at arXiv:1409.2638v1.
  • [33] Munk, A. and Czado, C. (1998). Nonparametric validation of similar distributions and assessment of goodness of fit. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 223–241.
  • [34] Pass, B. (2013). Optimal transportation with infinitely many marginals. J. Funct. Anal. 264 947–963.
  • [35] Rippl, T., Munk, A. and Sturm, A. (2016). Limit laws of the empirical Wasserstein distance: Gaussian distributions. J. Multivariate Anal. 151 90–109.
  • [36] Rousseeuw, P. (1985). Multivariate estimation with high breakdown point. In Mathematical Statistics and Applications, Vol. B (Bad Tatzmannsdorf, 1983) (W. Grossman, G. Pflug, I. Vincze and W. Werttz, eds.) 283–297. Dordrecht: Reidel.
  • [37] Rousseeuw, P.J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871–880.
  • [38] Rousseeuw, P.J. and van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics 41 212–223.
  • [39] Rüschendorf, L. and Rachev, S.T. (1990). A characterization of random variables with minimum $L^{2}$-distance. J. Multivariate Anal. 32 48–54.
  • [40] Rüschendorf, L. and Uckelmann, L. (2002). On the $n$-coupling problem. J. Multivariate Anal. 81 242–258.
  • [41] Villani, C. (2003). Topics in Optimal Transportation. Graduate Studies in Mathematics 58. Providence, RI: Amer. Math. Soc.
  • [42] Villani, C. (2009). Optimal Transport: Old and New. Berlin: Springer.
  • [43] Woodruff, D.L. and Rocke, D.M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators. J. Amer. Statist. Assoc. 89 888–896.