Bernoulli

On data depth and distribution-free discriminant analysis using separating surfaces

Anil K. Ghosh and Probal Chaudhuri

Full-text: Open access

Abstract

A very well-known traditional approach in discriminant analysis is to use some linear (or nonlinear) combination of measurement variables which can enhance class separability. For instance, a linear (or a quadratic) classifier finds the linear projection (or the quadratic function) of the measurement variables that will maximize the separation between the classes. These techniques are very useful in obtaining good lower-dimensional views of class separability. Fisher's discriminant analysis, which is primarily motivated by the multivariate normal distribution, uses the first- and second-order moments of the training sample to build such classifiers. These estimates, however, are highly sensitive to outliers, and they are not reliable for heavy-tailed distributions. This paper investigates two distribution-free methods for linear classification, which are based on the notions of statistical depth functions. One of these classifiers is closely related to Tukey's half-space depth, while the other is based on the concept of regression depth. Both these methods can be generalized for constructing nonlinear surfaces to discriminate among competing classes. These depth-based methods assume some finite-dimensional parametric form of the discriminating surface and use the distributional geometry of the data cloud to build the classifier. We use a few simulated and real data sets to examine the performance of these discriminant analysis tools and study their asymptotic properties under appropriate regularity conditions.

Article information

Source
Bernoulli, Volume 11, Number 1 (2005), 1-27.

Dates
First available in Project Euclid: 7 March 2005

Permanent link to this document
https://projecteuclid.org/euclid.bj/1110228239

Digital Object Identifier
doi:10.3150/bj/1110228239

Mathematical Reviews number (MathSciNet)
MR2121452

Zentralblatt MATH identifier
1059.62064

Keywords
Bayes risk elliptic symmetry generalized U-statistic half-space depth linear discriminant analysis location-shift models misclassification rates optimal Bayes classifier quadratic discriminant analysis regression depth robustness Vapnik-Chervonenkis dimension

Citation

Ghosh, Anil K.; Chaudhuri, Probal. On data depth and distribution-free discriminant analysis using separating surfaces. Bernoulli 11 (2005), no. 1, 1--27. doi:10.3150/bj/1110228239. https://projecteuclid.org/euclid.bj/1110228239


Export citation

References

  • [1] Albert, A. and Anderson, J.A. (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71, 1-10.
  • [2] Bai, Z.-D. and He, X. (1999) Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann. Statist., 27, 1616-1637.
  • [3] Campbell, N.A. and Mahon, R.J. (1974) A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Austral. J. Zoology, 22, 417-425.
  • [4] Chaudhuri, P. and Sengupta, D. (1993) Sign tests in multi-dimension: inference based on the geometry of the data cloud. J. Amer. Statist. Assoc., 88, 1363-1370.
  • [5] Christmann, A. (2002) Classification based on support vector machine and on regression depth. In Y. Dodge (ed.), Statistics and Data Analysis Based on L1-Norm and Related Methods, pp. 341-352. Boston: Birkhäuser.
  • [6] Christmann, A. and Rousseeuw, P. (2001) Measuring overlap in binary regression. Comput. Statist. Data Anal., 37, 65-75.
  • [7] Christmann, A., Fischer, P. and Joachims, T. (2002) Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Comput. Statist., 17, 273-287.
  • [8] Cox, L.H., Johnson, M.M. and Kafadar, K. (1982) Exposition of statistical graphics technology. ASA Proc. Statist. Comput. Section, pp. 55-56.
  • [9] Donoho, D. and Gasko, M. (1992) Breakdown properties of location estimates based on half-space depth and projected outlyingness. Ann. Statist., 20, 1803-1827.
  • [10] Duda, R., Hart, P. and Stork, D.G. (2000) Pattern Classification. New York: Wiley.
  • [11] Fang, K.-T., Kotz, S. and Ng, K.W. (1989) Symmetric Multivariate and Related Distributions. London: Chapman & Hall.
  • [12] Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7, 179-188.
  • [13] Friedman, J.H. (1989) Regularized discriminant analysis. J. Amer. Statist. Assoc., 84, 165-175.
  • [14] Friedman, J.H. (1996) Another approach to polychotomous classification. Technical Report, Department of Statistics, Stanford University.
  • [15] Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. New York: Academic Press.
  • [16] Ghosh, A.K. and Chaudhuri, P. (2004) On maximum depth classifiers. Submitted for publication.
  • [17] Hand, D.J. (1981) Discrimination and Classification. New York: Wiley.
  • [18] Hastie, T., Tibshirani, R. and Buja, A. (1994) Flexible discriminant analysis. J. Amer. Statist. Assoc., 89, 1255-1270.
  • [19] Hastie, T. and Tibshirani, R. (1998) Classification by pairwise coupling. Ann. Statist., 26, 451-471.
  • [20] Hastie, T., Tibshirani, R. and Friedman, J.H. (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York: Springer-Verlag.
  • [21] He, X. and Wang, G. (1997) Convergence of depth contours for multivariate datasets. Ann. Statist., 25, 495-504.
  • [22] Hoeffding, W. (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58, 13-30.
  • [23] Jornsten, R. (2004) Clustering and classification based on L1 data depth. J. Multivariate Anal, 90, 67-89.
  • [24] Jornsten, R., Vardi, Y. and Zhang, C. H. (2002) A robust clustering method and visualization tool based on data depth. In Y. Dodge (ed.), Statistical Data Analysis, pp. 353-366. Basel: Birkhäuser.
  • [25] Liu, R. (1990) On notion of data depth based on random simplicies. Ann. Statist., 18, 405-414.
  • [26] Liu, R., Parelius, J. and Singh, K. (1999) Multivariate analysis of the data-depth: descriptive statistics and inference. Ann. Statist., 27, 783-858.
  • [27] McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley.
  • [28] Mosler, K. (2002) Multivariate Dispersions, Central Regions and Depth. New York: Springer-Verlag.
  • [29] Nolan, D. (1992) Asymptotics for multivariate trimming. Stochastic Process. Appl., 42, 157-169.
  • [30] Peterson, G.E. and Barney, H.L. (1952) Control methods used in a study of vowels. J. Acoust. Soc. Amer., 24, 175-185.
  • [31] Pollard, D. (1984) Convergence of Stochastic Processes. New York: Springer Verlag.
  • [32] Reaven, G.M. and Miller, R.G. (1979) An attempt to define the nature of chemical diabetes using a multidimensional analysis. Diabetologia, 16, 17-24.
  • [33] Ripley, B.D. (1994) Neural networks and related methods for classification (with discussion.) J. Roy. Statist. Soc. Ser. B, 56, 409-456.
  • [34] Ripley, B.D. (1996) Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
  • [35] Rousseeuw, P.J. and Hubert, M. (1999) Regression depth (with discussions). J. Amer. Statist. Assoc., 94, 388-402.
  • [36] Rousseeuw, P.J. and Ruts, I. (1996) Algorithm AS 307: Bivariate location depth. Appl. Statist., 45, 516-526.
  • [37] Rousseeuw, P.J. and Struyf, A. (1998) Computing location depth and regression depth in higher dimensions. Statist. Comput., 8, 193-203.
  • [38] Santner, T.J. and Duffy, D.E. (1986) A note on A. Albert and J.A. Anderson´s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika, 73, 755-758.
  • [39] Serfling, R. (1980) Approximation Theorems of Mathematical Statistics. New York: Wiley.
  • [40] Serfling, R. (2002) A depth function and a scale curve based on spatial quantiles. In Y. Dodge (ed.), Statistics and Data Analysis Based on L1-Norm and Related Methods, pp. 25-38. Boston: Birkhäuser.
  • [41] Tukey, J.W. (1975) Mathematics and picturing of data. In R.D. James (ed.), Proceedings of the International Congress of Mathematics, Vancouver 1974, Canadian Mathematical Congress, Montreal, Que., Vol. 2, pp. 523-531.
  • [42] Vapnik, V.N. (1998) Statistical Learning Theory. New York: Wiley.
  • [43] Vapnik, V.N. and Chervonenkis, A.Y. (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl., 16, 264-281.
  • [44] Vardi, Y. and Zhang, C.H. (2000) The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci, USA, 97, 1423-1426.
  • [45] Zhu, M. and Hastie, T. (2003) Feature extraction for nonparametric discriminant analysis. J. Comput. Graph. Statist., 12, 101-120.
  • [46] Zuo, Y. and Serfling, R. (2000a) General notions of statistical depth functions. Ann. Statist., 28, 461-482.
  • [47] Zuo, Y. and Serfling, R. (2000b) Structural properties and convergence results for contours of sample statistical depth functions. Ann. Statist., 28, 483-499.