Annales de l'Institut Henri Poincaré, Probabilités et Statistiques

Geodesic PCA in the Wasserstein space by convex PCA

Jérémie Bigot, Raúl Gouet, Thierry Klein, and Alfredo López

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids.

Résumé

Nous introduisons la méthode d’Analyse en Composantes Principales Géodésiques (GPCA) dans l’espace des mesures de probabilités à support sur la droite réelle, admettant un moment d’ordre deux, et muni de la métrique de Wasserstein. Nous discutons des avantages de cette approche par rapport à une ACP fonctionnelle standard de densités de probabilités dans l’espace de Hilbert des fonctions de carrés intégrable. Nous établissons la consistence de cette méthode en montrant que la GPCA empirique converge vers sa version population lorsque la taille de l’échantillon tend vers l’infini. Une propriété clé dans l’étude de la GPCA est l’isométrie entre l’espace de Wasserstein et un sous-espace convexe fermé de l’ensemble des fonctions de carrés intégrable, par rapport à une mesure de référence appropriée. De ce fait, nous considérons le problème général de l’ACP dans un sous-ensemble convexe fermé d’un espace de Hilbert séparable, qui sert de base à l’analyse de la GPCA. Nous proposons différents exemples illustratifs à partir de modèles statistiques simples pour montrer les bénéfices de cette approche pour l’analyse de données. La méthode est également appliquée à un exemple réel sur les pyramides des âges.

Article information

Source
Ann. Inst. H. Poincaré Probab. Statist., Volume 53, Number 1 (2017), 1-26.

Dates
Received: 9 September 2013
Revised: 17 July 2015
Accepted: 31 July 2015
First available in Project Euclid: 8 February 2017

Permanent link to this document
https://projecteuclid.org/euclid.aihp/1486544882

Digital Object Identifier
doi:10.1214/15-AIHP706

Mathematical Reviews number (MathSciNet)
MR3606732

Zentralblatt MATH identifier
1362.62065

Subjects
Primary: 62G05: Estimation
Secondary: 62G20: Asymptotic properties

Keywords
Wasserstein space Geodesic and Convex Principal Component Analysis Fréchet mean Functional data analysis Geodesic space Inference for family of densities

Citation

Bigot, Jérémie; Gouet, Raúl; Klein, Thierry; López, Alfredo. Geodesic PCA in the Wasserstein space by convex PCA. Ann. Inst. H. Poincaré Probab. Statist. 53 (2017), no. 1, 1--26. doi:10.1214/15-AIHP706. https://projecteuclid.org/euclid.aihp/1486544882


Export citation

References

  • [1] M. Agueh and G. Carlier. Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43 (2) (2011) 904–924.
  • [2] L. Ambrosio, N. Gigli and G. Savaré. Gradient flows with metric and differentiable structures, and applications to the Wasserstein space. Atti Accad. Naz. Lincei Cl. Sci. Fis. Mat. Natur. Rend. Lincei (9) Mat. Appl. 15 (3–4) (2004) 327–343.
  • [3] Z. Artstein and R. J. B. Wets. Consistency of minimizers and the SLLN for stochastic programs. J. Convex Anal. 2 (1–2) (1995) 1–17.
  • [4] H. Attouch. Variational Convergence for Functions and Operators. Applicable Mathematic Series. Pitman, London, 1984.
  • [5] G. Beer. On convergence of closed sets in a metric space and distance functions. Bull. Aust. Math. Soc. 31 (1985) 421–432.
  • [6] R. Bhattacharya and V. Patrangenaru. Large sample theory of intrinsic and extrinsic sample means on manifolds. Ann. Statist. 31 (1) (2003) 1–29.
  • [7] J. Bigot and T. Klein. Consistent estimation of a population barycenter in the Wasserstein space. Preprint, 2012. Available at arXiv:1212.2562.
  • [8] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. 44 (4) (1991) 375–417.
  • [9] O. Chodosh. Optimal transport and Ricci curvature: Wasserstein space over the interval. Preprint, 2011. Available at arXiv:1105.2883.
  • [10] G. Dal Maso. An Introduction to $\Gamma$-Convergence. Progress in Nonlinear Differential Equations and Their Applications 8. Birkhäuser, Boston, MA, 1993.
  • [11] J. Dauxois, A. Pousse and Y. Romain. Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal. 12 (1) (1982) 136–154.
  • [12] P. Delicado. Dimensionality reduction when data are density functions. Comput. Statist. Data Anal. 55 (1) (2011) 401–420.
  • [13] P. Embrechts and M. Hofert. A note on generalized inverses. Math. Methods Oper. Res. 77 (2013) 423–432.
  • [14] P. T. Fletcher, C. Lu, S. M. Pizer and S. Joshi. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging 23 (8) (2004) 995–1005.
  • [15] S. Gallón, J.-M. Loubes and E. Maza. Statistical properties of the quantile normalization method for density curve alignment. Math. Biosci. 242 (2) (2013) 129–142.
  • [16] N. N. Hai and P. T. An. A generalization of Blaschke’s convergence theorem in metric spaces. J. Convex Anal. 4 (2013) 1013–1024.
  • [17] S. Huckemann, T. Hotz and A. Munk. Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions. Statist. Sinica 20 (2010) 1–100.
  • [18] B. E. Johnson. Separate continuity and measurability. Proc. Amer. Math. Soc. 20 (2) (1969) 420–422.
  • [19] A. Kneip and K. J. Utikal. Inference for density families using functional principal component analysis. J. Amer. Statist. Assoc. 96 (454) (2001) 519–542. With comments and a rejoinder by the authors.
  • [20] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Classics in Mathematics. Springer-Verlag, Berlin, 2011. Reprint of the 1991 edition.
  • [21] W. K. Newey and D. McFadden. Large sample estimation and hypothesis testing. In Handbook of Econometrics, Vol. IV 2111–2245. Handbooks in Econom. 2. North-Holland, Amsterdam, 1994.
  • [22] B. Price. On the completeness of a certain metric space with an application to Blaschke’s selection theorem. Bull. Amer. Math. Soc. (N.S.) 46 (1940) 278–280.
  • [23] J. O. Ramsay and B. W. Silverman. Functional Data Analysis, 2nd edition. Springer Series in Statistics. Springer, New York, 2005.
  • [24] R. T. Rockafellar and J. O. Royset. Random variables, monotone relations, and convex analysis. Math. Program. 148 (2014) 297–331.
  • [25] W. Rudin. Lebesgue’s first theorem. In Mathematical Analysis and Applications, Part B 741–747. Advances in Math. Suppl. Stud 7b. Academic Press, New York, 1981.
  • [26] B. W. Silverman. Smoothed functional principal components analysis by choice of norm. Ann. Statist. 24 (1) (1996) 1–24.
  • [27] S. Sommer, F. Lauze, S. Hauberg and M. Nielsen. Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations. In Computer Vision – ECCV 2010 43–56. K. Daniilidis, P. Maragos and N. Paragios (Eds). Lecture Notes in Computer Science 6316. Springer, Berlin, 2010.
  • [28] C. Villani. Topics in Optimal Transportation. Graduate Studies in Mathematics 58. Amer. Math. Soc., Providence, RI, 2003.
  • [29] Z. Zhang and H.-G. Müller. Functional density synchronization. Comput. Statist. Data Anal. 55 (7) (2011) 2234–2249.
  • [30] H. Ziezold. On expected figures and a strong law of large numbers for random elements in quasi-metric spaces. In Trans. 7th Prague Conf. Inf. Theory, Stat. Dec. Func., Random Processes, volume A 591–602. Reidel, Dordrecht, 1977.