The Annals of Statistics

Multivariate spacings based on data depth: I. Construction of nonparametric multivariate tolerance regions

Jun Li and Regina Y. Liu

Full-text: Open access

Abstract

This paper introduces and studies multivariate spacings. The spacings are developed using the order statistics derived from data depth. Specifically, the spacing between two consecutive order statistics is the region which bridges the two order statistics, in the sense that the region contains all the points whose depth values fall between the depth values of the two consecutive order statistics. These multivariate spacings can be viewed as a data-driven realization of the so-called “statistically equivalent blocks.” These spacings assume a form of center-outward layers of “shells” (“rings” in the two-dimensional case), where the shapes of the shells follow closely the underlying probabilistic geometry. The properties and applications of these spacings are studied. In particular, the spacings are used to construct tolerance regions. The construction of tolerance regions is nonparametric and completely data driven, and the resulting tolerance region reflects the true geometry of the underlying distribution. This is different from most existing approaches which require that the shape of the tolerance region be specified in advance. The proposed tolerance regions are shown to meet the prescribed specifications, in terms of β-content and β-expectation. They are also asymptotically minimal under elliptical distributions. Finally, a simulation and comparison study on the proposed tolerance regions is presented.

Article information

Source
Ann. Statist., Volume 36, Number 3 (2008), 1299-1323.

Dates
First available in Project Euclid: 26 May 2008

Permanent link to this document
https://projecteuclid.org/euclid.aos/1211819565

Digital Object Identifier
doi:10.1214/07-AOS505

Mathematical Reviews number (MathSciNet)
MR2418658

Zentralblatt MATH identifier
1360.62253

Subjects
Primary: 62G15: Tolerance and confidence regions 62G30: Order statistics; empirical distribution functions
Secondary: 62G20: Asymptotic properties 62H05: Characterization and structure theory

Keywords
Data depth depth order statistics multivariate spacings statistically equivalent blocks tolerance region

Citation

Li, Jun; Liu, Regina Y. Multivariate spacings based on data depth: I. Construction of nonparametric multivariate tolerance regions. Ann. Statist. 36 (2008), no. 3, 1299--1323. doi:10.1214/07-AOS505. https://projecteuclid.org/euclid.aos/1211819565


Export citation

References

  • [1] Barber, C. B., Dobkin, D. P. and Huhdanpaa, H. (1996). The Quickhull algorithm for convex hulls. ACM Trans. Math. Software 22 469–483.
  • [2] Beirlant, J., Dierckx, G., Guillou, A. and Stacaronricacaron, C. (2002). On exponential representations of Log-spacings of extreme order statistics. Extremes 5 157–180.
  • [3] Chatterjee, S. K. and Patra, N. K. (1980). Asymptotically minimal multivariate tolerance sets. Calcutta Statist. Assoc. Bull. 29 73–93.
  • [4] Cressie, N. (1979). An optimal statistic based on higher order gaps. Biometrika 66 619–627.
  • [5] Darling, D. (1953). On a class of problems related to the random division of an interval. Ann. Math. Statist. 24 239–253.
  • [6] Di Bucchianico, A., Einmahl, J. H. J. and Mushkudiani, N. A. (2001). Smallest nonparametric tolerance regions. Ann. Statist. 29 1320–1343.
  • [7] Dohoho, D. (1982). Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, Harvard Univ.
  • [8] Donoho, D. and Gasko, M. (1992). Breakdown properties of location estimates based on half-space depth and projected outlyingness. Ann. Statist. 20 1803–1827.
  • [9] Einmahl, J. H. J. and van Zuijlen, M. (1988). Strong bounds for weighted empirical distribution functions based on uniform spacings. Ann. Probab. 16 108–125.
  • [10] Fraser, D. (1951). Sequentially determined statistically equivalent blocks. Ann. Math. Statist. 22 372–381.
  • [11] Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Charles Griffin, London.
  • [12] Hall, P. (1986). On powerful distributional tests based on sample spacings. J. Multivariate Anal. 19 201–224.
  • [13] He, X. and Wang, G. (1997). Convergence of depth contours for multivariate datasets. Ann. Statist. 25 495–504.
  • [14] Hodges, J. (1955). A bivariate sign test. Ann. Math. Statistics 26 523–527.
  • [15] Howe, W. G. (1969). Two-sided tolerance limits for normal populations—some improvements. J. Amer. Statist. Assoc. 64 610–620.
  • [16] Liu, R. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
  • [17] Liu, R., Parelius, J. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with discussion). Ann. Statist. 27 783–858.
  • [18] Liu, R. and Singh, K. (1992). Ordering directional data: Concepts of data depth on circles and spheres. Ann. Statist. 20 1468–1484.
  • [19] Liu, R. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests. J. Amer. Statist. Assoc. 88 252–260.
  • [20] Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proc. Nat. Inst. Sci. India 12 49–55.
  • [21] Moran, P. (1947). A random division of an interval. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 9 92–98.
  • [22] Pyke, R. (1965). Spacings. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 27 395–449.
  • [23] Stahel, W. (1981). Robust Schaetzungen: Infinitesmale Optimalitaet und Schaetzungen von Kovarianzmatrizen. Ph.D. thesis, ETH Zurich.
  • [24] Tukey, J. (1947). Nonparametric estimation. II. Statistical equivalent blocks and tolerance regions—the continuous case. Ann. Math. Statist. 18 529–539.
  • [25] Tukey, J. (1975). Mathematics and picturing data. Proceedings of the 1975 International Congress of Mathematics 2 523–531.
  • [26] Wald, A. (1943). An extension of Wilks’ method for setting tolerance limits. Ann. Math. Statist. 14 45–55.
  • [27] Weiss, L. (1957). Asymptotic power of certain tests of fit based on sample spacings. Ann. Math. Statist. 28 783–786.
  • [28] Wells, M., Jammalamadaka, S. R. and Tiwari, R. (1993). Large sample theory of spacings statistics for tests of fit for the composite hypothesis. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 55 189–203.
  • [29] Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. Ann. Math. Statist. 12 91–96.
  • [30] Zuo, Y. (2003). Projection based depth functions and associated medians. Ann. Statist. 31 1460–1490.
  • [31] Zuo, Y. and Serfling, R. (2000). Structural properties and convergence results for contours of sample statistical depth functions. Ann. Statist. 28 483–499.