On data depth and distribution-free discriminant analysis using separating surfaces

Anil K. Ghosh and Probal Chaudhuri

Full-text: Open access


A very well-known traditional approach in discriminant analysis is to use some linear (or nonlinear) combination of measurement variables which can enhance class separability. For instance, a linear (or a quadratic) classifier finds the linear projection (or the quadratic function) of the measurement variables that will maximize the separation between the classes. These techniques are very useful in obtaining good lower-dimensional views of class separability. Fisher's discriminant analysis, which is primarily motivated by the multivariate normal distribution, uses the first- and second-order moments of the training sample to build such classifiers. These estimates, however, are highly sensitive to outliers, and they are not reliable for heavy-tailed distributions. This paper investigates two distribution-free methods for linear classification, which are based on the notions of statistical depth functions. One of these classifiers is closely related to Tukey's half-space depth, while the other is based on the concept of regression depth. Both these methods can be generalized for constructing nonlinear surfaces to discriminate among competing classes. These depth-based methods assume some finite-dimensional parametric form of the discriminating surface and use the distributional geometry of the data cloud to build the classifier. We use a few simulated and real data sets to examine the performance of these discriminant analysis tools and study their asymptotic properties under appropriate regularity conditions.

Article information

Bernoulli, Volume 11, Number 1 (2005), 1-27.

First available in Project Euclid: 7 March 2005

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayes risk elliptic symmetry generalized U-statistic half-space depth linear discriminant analysis location-shift models misclassification rates optimal Bayes classifier quadratic discriminant analysis regression depth robustness Vapnik-Chervonenkis dimension


Ghosh, Anil K.; Chaudhuri, Probal. On data depth and distribution-free discriminant analysis using separating surfaces. Bernoulli 11 (2005), no. 1, 1--27. doi:10.3150/bj/1110228239.

Export citation


  • [1] Albert, A. and Anderson, J.A. (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71, 1-10.
  • [2] Bai, Z.-D. and He, X. (1999) Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann. Statist., 27, 1616-1637.
  • [3] Campbell, N.A. and Mahon, R.J. (1974) A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Austral. J. Zoology, 22, 417-425.
  • [4] Chaudhuri, P. and Sengupta, D. (1993) Sign tests in multi-dimension: inference based on the geometry of the data cloud. J. Amer. Statist. Assoc., 88, 1363-1370.
  • [5] Christmann, A. (2002) Classification based on support vector machine and on regression depth. In Y. Dodge (ed.), Statistics and Data Analysis Based on L1-Norm and Related Methods, pp. 341-352. Boston: Birkhäuser.
  • [6] Christmann, A. and Rousseeuw, P. (2001) Measuring overlap in binary regression. Comput. Statist. Data Anal., 37, 65-75.
  • [7] Christmann, A., Fischer, P. and Joachims, T. (2002) Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Comput. Statist., 17, 273-287.
  • [8] Cox, L.H., Johnson, M.M. and Kafadar, K. (1982) Exposition of statistical graphics technology. ASA Proc. Statist. Comput. Section, pp. 55-56.
  • [9] Donoho, D. and Gasko, M. (1992) Breakdown properties of location estimates based on half-space depth and projected outlyingness. Ann. Statist., 20, 1803-1827.
  • [10] Duda, R., Hart, P. and Stork, D.G. (2000) Pattern Classification. New York: Wiley.
  • [11] Fang, K.-T., Kotz, S. and Ng, K.W. (1989) Symmetric Multivariate and Related Distributions. London: Chapman & Hall.
  • [12] Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7, 179-188.
  • [13] Friedman, J.H. (1989) Regularized discriminant analysis. J. Amer. Statist. Assoc., 84, 165-175.
  • [14] Friedman, J.H. (1996) Another approach to polychotomous classification. Technical Report, Department of Statistics, Stanford University.
  • [15] Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. New York: Academic Press.
  • [16] Ghosh, A.K. and Chaudhuri, P. (2004) On maximum depth classifiers. Submitted for publication.
  • [17] Hand, D.J. (1981) Discrimination and Classification. New York: Wiley.
  • [18] Hastie, T., Tibshirani, R. and Buja, A. (1994) Flexible discriminant analysis. J. Amer. Statist. Assoc., 89, 1255-1270.
  • [19] Hastie, T. and Tibshirani, R. (1998) Classification by pairwise coupling. Ann. Statist., 26, 451-471.
  • [20] Hastie, T., Tibshirani, R. and Friedman, J.H. (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York: Springer-Verlag.
  • [21] He, X. and Wang, G. (1997) Convergence of depth contours for multivariate datasets. Ann. Statist., 25, 495-504.
  • [22] Hoeffding, W. (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58, 13-30.
  • [23] Jornsten, R. (2004) Clustering and classification based on L1 data depth. J. Multivariate Anal, 90, 67-89.
  • [24] Jornsten, R., Vardi, Y. and Zhang, C. H. (2002) A robust clustering method and visualization tool based on data depth. In Y. Dodge (ed.), Statistical Data Analysis, pp. 353-366. Basel: Birkhäuser.
  • [25] Liu, R. (1990) On notion of data depth based on random simplicies. Ann. Statist., 18, 405-414.
  • [26] Liu, R., Parelius, J. and Singh, K. (1999) Multivariate analysis of the data-depth: descriptive statistics and inference. Ann. Statist., 27, 783-858.
  • [27] McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley.
  • [28] Mosler, K. (2002) Multivariate Dispersions, Central Regions and Depth. New York: Springer-Verlag.
  • [29] Nolan, D. (1992) Asymptotics for multivariate trimming. Stochastic Process. Appl., 42, 157-169.
  • [30] Peterson, G.E. and Barney, H.L. (1952) Control methods used in a study of vowels. J. Acoust. Soc. Amer., 24, 175-185.
  • [31] Pollard, D. (1984) Convergence of Stochastic Processes. New York: Springer Verlag.
  • [32] Reaven, G.M. and Miller, R.G. (1979) An attempt to define the nature of chemical diabetes using a multidimensional analysis. Diabetologia, 16, 17-24.
  • [33] Ripley, B.D. (1994) Neural networks and related methods for classification (with discussion.) J. Roy. Statist. Soc. Ser. B, 56, 409-456.
  • [34] Ripley, B.D. (1996) Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
  • [35] Rousseeuw, P.J. and Hubert, M. (1999) Regression depth (with discussions). J. Amer. Statist. Assoc., 94, 388-402.
  • [36] Rousseeuw, P.J. and Ruts, I. (1996) Algorithm AS 307: Bivariate location depth. Appl. Statist., 45, 516-526.
  • [37] Rousseeuw, P.J. and Struyf, A. (1998) Computing location depth and regression depth in higher dimensions. Statist. Comput., 8, 193-203.
  • [38] Santner, T.J. and Duffy, D.E. (1986) A note on A. Albert and J.A. Anderson´s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika, 73, 755-758.
  • [39] Serfling, R. (1980) Approximation Theorems of Mathematical Statistics. New York: Wiley.
  • [40] Serfling, R. (2002) A depth function and a scale curve based on spatial quantiles. In Y. Dodge (ed.), Statistics and Data Analysis Based on L1-Norm and Related Methods, pp. 25-38. Boston: Birkhäuser.
  • [41] Tukey, J.W. (1975) Mathematics and picturing of data. In R.D. James (ed.), Proceedings of the International Congress of Mathematics, Vancouver 1974, Canadian Mathematical Congress, Montreal, Que., Vol. 2, pp. 523-531.
  • [42] Vapnik, V.N. (1998) Statistical Learning Theory. New York: Wiley.
  • [43] Vapnik, V.N. and Chervonenkis, A.Y. (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl., 16, 264-281.
  • [44] Vardi, Y. and Zhang, C.H. (2000) The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci, USA, 97, 1423-1426.
  • [45] Zhu, M. and Hastie, T. (2003) Feature extraction for nonparametric discriminant analysis. J. Comput. Graph. Statist., 12, 101-120.
  • [46] Zuo, Y. and Serfling, R. (2000a) General notions of statistical depth functions. Ann. Statist., 28, 461-482.
  • [47] Zuo, Y. and Serfling, R. (2000b) Structural properties and convergence results for contours of sample statistical depth functions. Ann. Statist., 28, 483-499.