The Annals of Statistics

The spatial distribution in infinite dimensional spaces and related quantiles and depths

Anirvan Chakraborty and Probal Chaudhuri

Full-text: Open access

Abstract

The spatial distribution has been widely used to develop various nonparametric procedures for finite dimensional multivariate data. In this paper, we investigate the concept of spatial distribution for data in infinite dimensional Banach spaces. Many technical difficulties are encountered in such spaces that are primarily due to the noncompactness of the closed unit ball. In this work, we prove some Glivenko–Cantelli and Donsker-type results for the empirical spatial distribution process in infinite dimensional spaces. The spatial quantiles in such spaces can be obtained by inverting the spatial distribution function. A Bahadur-type asymptotic linear representation and the associated weak convergence results for the sample spatial quantiles in infinite dimensional spaces are derived. A study of the asymptotic efficiency of the sample spatial median relative to the sample mean is carried out for some standard probability distributions in function spaces. The spatial distribution can be used to define the spatial depth in infinite dimensional Banach spaces, and we study the asymptotic properties of the empirical spatial depth in such spaces. We also demonstrate the spatial quantiles and the spatial depth using some real and simulated functional data.

Article information

Source
Ann. Statist., Volume 42, Number 3 (2014), 1203-1231.

Dates
First available in Project Euclid: 20 June 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1403276912

Digital Object Identifier
doi:10.1214/14-AOS1226

Mathematical Reviews number (MathSciNet)
MR3224286

Zentralblatt MATH identifier
1305.62141

Subjects
Primary: 62G05: Estimation
Secondary: 60B12: Limit theorems for vector-valued random variables (infinite- dimensional case) 60G12: General second-order processes

Keywords
Asymptotic relative efficiency Bahadur representation DD-plot Donsker property Gâteaux derivative Glivenko–Cantelli property Karhunen–Loève expansion smooth Banach space

Citation

Chakraborty, Anirvan; Chaudhuri, Probal. The spatial distribution in infinite dimensional spaces and related quantiles and depths. Ann. Statist. 42 (2014), no. 3, 1203--1231. doi:10.1214/14-AOS1226. https://projecteuclid.org/euclid.aos/1403276912


Export citation

References

  • [1] Araujo, A. and Giné, E. (1980). The Central Limit Theorem for Real and Banach Valued Random Variables. Wiley, New York.
  • [2] Asplund, E. (1968). Fréchet differentiability of convex functions. Acta Math. 121 31–47.
  • [3] Borwein, J. M. and Vanderwerff, J. D. (2010). Convex Functions: Constructions, Characterizations and Counterexamples. Cambridge Univ. Press, Cambridge.
  • [4] Boyd, J. P. (1984). Asymptotic coefficients of Hermite function series. J. Comput. Phys. 54 382–410.
  • [5] Brown, B. M. (1983). Statistical uses of the spatial median. J. Roy. Statist. Soc. Ser. B 45 25–30.
  • [6] Bugni, F. A., Hall, P., Horowitz, J. L. and Neumann, G. R. (2009). Goodness-of-fit tests for functional data. Econom. J. 12 S1–S18.
  • [7] Cadre, B. (2001). Convergent estimators for the $L_1$-median of a Banach valued random variable. Statistics 35 509–521.
  • [8] Cardot, H., Cénac, P. and Zitt, P.-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli 19 18–43.
  • [9] Chakraborty, A. and Chaudhuri, P. (2014). A Wilcoxon–Mann–Whitney type test for infinite dimensional data. Technical report. Available at arXiv:1403.0201.
  • [10] Chakraborty, A. and Chaudhuri, P. (2014). On data depth in infinite dimensional spaces. Ann. Inst. Statist. Math. 66 303–324.
  • [11] Chakraborty, B. (2001). On affine equivariant multivariate quantiles. Ann. Inst. Statist. Math. 53 380–403.
  • [12] Chaouch, M. and Goga, C. (2012). Using complex surveys to estimate the $L_1$-median of a functional variable: Application to electricity load curves. Int. Stat. Rev. 80 40–59.
  • [13] Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. J. Amer. Statist. Assoc. 91 862–872.
  • [14] Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. 20 1803–1827.
  • [15] Fabian, M., Habala, P., Hájek, P., Montesinos Santalucía, V., Pelant, J. and Zizler, V. (2001). Functional Analysis and Infinite-Dimensional Geometry. Springer, New York.
  • [16] Fraiman, R. and Muniz, G. (2001). Trimmed means for functional data. TEST 10 419–440.
  • [17] Fraiman, R. and Pateiro-López, B. (2012). Quantiles for finite and infinite dimensional data. J. Multivariate Anal. 108 1–14.
  • [18] Gervini, D. (2008). Robust functional estimation using the median and spherical principal components. Biometrika 95 587–600.
  • [19] Kemperman, J. H. B. (1987). The median of a finite measure on a Banach space. In Statistical Data Analysis Based on the $L_1$-norm and Related Methods (Neuchâtel, 1987) 217–230. North-Holland, Amsterdam.
  • [20] Kolmogorov, A. N. and Tihomirov, V. M. (1961). $\varepsilon $-entropy and $\varepsilon $-capacity of sets in functional space. Amer. Math. Soc. Transl. (2) 17 277–364.
  • [21] Koltchinskii, V. I. (1997). $M$-estimation, convexity and quantiles. Ann. Statist. 25 435–477.
  • [22] Kong, L. and Mizera, I. (2012). Quantile tomography: Using quantiles with multivariate data. Statist. Sinica 22 1589–1610.
  • [23] Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
  • [24] Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Statist. 27 783–858.
  • [25] López-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data. J. Amer. Statist. Assoc. 104 718–734.
  • [26] López-Pintado, S. and Romo, J. (2011). A half-region depth for functional data. Comput. Statist. Data Anal. 55 1679–1695.
  • [27] Möttönen, J., Oja, H. and Tienari, J. (1997). On the efficiency of multivariate spatial sign and rank tests. Ann. Statist. 25 542–552.
  • [28] Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1 327–332.
  • [29] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
  • [30] Serfling, R. (2002). A depth function and a scale curve based on spatial quantiles. In Statistical Data Analysis Based on the $L_1$-norm and Related Methods (Neuchâtel, 2002). Stat. Ind. Technol. 25–38. Birkhäuser, Basel.
  • [31] Small, C. G. (1990). A survey of multidimensional medians. Int. Stat. Rev. 58 263–277.
  • [32] Sun, Y. and Genton, M. G. (2011). Functional boxplots. J. Comput. Graph. Statist. 20 316–334.
  • [33] Trefethen, L. N. (2008). Is Gauss quadrature better than Clenshaw–Curtis? SIAM Rev. 50 67–87.
  • [34] Valadier, M. (1984). La multi-application médianes conditionnelles. Z. Wahrsch. Verw. Gebiete 67 279–282.
  • [35] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • [36] Vardi, Y. and Zhang, C.-H. (2000). The multivariate $L_1$-median and associated data depth. Proc. Natl. Acad. Sci. USA 97 1423–1426 (electronic).
  • [37] Vretblad, A. (2003). Fourier Analysis and Its Applications. Springer, New York.
  • [38] Wang, H. and Xiang, S. (2012). On the convergence rates of Legendre approximation. Math. Comp. 81 861–877.
  • [39] Yu, S., Tresp, V. and Yu, K. (2007). Robust multi-task learning with $t$-processes. In Proceedings of the 24th International Conference on Machine Learning (Oregon, 2007) 1103–1110. Omnipress, Corvallis, OR.
  • [40] Yurinskiĭ, V. V. (1976). Exponential inequalities for sums of random vectors. J. Multivariate Anal. 6 473–499.
  • [41] Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist. 28 461–482.