Electronic Journal of Statistics

A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions

Luke A. Prendergast

Full-text: Open access

Abstract

In this paper we introduce an influence measure based on second order expansion of the RV and GCD measures for the comparison between unperturbed and perturbed eigenvectors of a symmetric matrix estimator. Example estimators are considered to highlight how this measure compliments recent influence analysis. Importantly, we also show how a sample based version of this measure can be used to accurately and efficiently detect influential observations in practice.

Article information

Source
Electron. J. Statist., Volume 2 (2008), 454-467.

Dates
First available in Project Euclid: 26 June 2008

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1214491851

Digital Object Identifier
doi:10.1214/08-EJS201

Mathematical Reviews number (MathSciNet)
MR2417389

Zentralblatt MATH identifier
1320.62140

Subjects
Primary: 62F35: Robustness and adaptive procedures
Secondary: 62H12: Estimation

Keywords
distance between subspaces influential observations perturbation principal component analysis

Citation

Prendergast, Luke A. A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions. Electron. J. Statist. 2 (2008), 454--467. doi:10.1214/08-EJS201. https://projecteuclid.org/euclid.ejs/1214491851


Export citation

References

  • [1] Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A.J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays., Proc. Natl. Acad. Sci. USA. (1999), 96 6745–6750.
  • [2] Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A. and Sorensen, D. LAPACK Users’ Guide. 3rd Ed. Society for Industrial and Applied Mathematics: Philadelphia, PA., (1999)
  • [3] Bénasséni, J., Sensitivity coefficients for the subspaces spanned by principal components., Commun. Statist.-Theory Meth. 19, (1990) 2021–2034.
  • [4] Brillinger, D. R., The identification of a particular nonlinear time series system., Biometrika. 64, (1977) 509–515.
  • [5] Brillinger, D. R. A Generalized Linear Model with “Gaussian” Regressor Variables. In: A Festschrift for Erich L. Lehmann, Wadsworth International Group, Belmont, California, (1983) pp., 97–114.
  • [6] Cook, R. D. and Weisberg, S., Discussion of “Sliced Inverse Regression for Dimension Reduction”., J. Amer. Statist. Assoc. 86, (1991) 328–332.
  • [7] Cook, R. D. Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York., (1998)
  • [8] Cook, R. D. Principal Hessian directions revisited., J. Amer. Statist. Assoc. 93, (1998) 84–94.
  • [9] Critchley, F. Influence in principal components analysis., Biometrika 72, (1985) 627–636.
  • [10] Croux, C. and Haesbroeck, G. Influence function and efficiency of the minimum covariance determinant scatter matrix estimator., J. Mult. Anal. 71, (1999) 161–190.
  • [11] Croux, C. and Haesbroeck, G. Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies., Biometrika 87, (2000) 603–618.
  • [12] Davies, P. L. Asymptotic behaivior of, S-estimates of multivariate location parameters and dispersion matrices. Ann. Statist. 15, (1987) 1269–1292.
  • [13] Devlin, S. J., Gnanadesikan, R. and Kettenring, J. R., Robust estimation and Outlier Detection with Correlation Coefficients., Biometrika 62, 531–545.
  • [14] Enguix-González, A. and Muñoz-Pichardo, J. M and Moreno-Rebollo, J. L. and Pino-Mejías, R. Influence Analysis in Principal Component Analysis through power-series expansions., Commun. Statist.-Theory Meth. 34, (2007) 2025–2046.
  • [15] Escoufier, Y. Le traitement des variables vectorielles., Biometrics 29, (1973) 751–760.
  • [16] Hampel, F. R., The influence curve and its role in robust estimation., J. Amer. Statist. Assoc. 69, (1974) 383–393.
  • [17] Hampel, F. R, Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A., Robust Statistics: The Approach Based on Influence Functions, New York: Wiley, (1986)
  • [18] Li, K.-C., Sliced Inverse Regression for Dimension Reduction (with discussion)., J. Amer. Statist. Assoc. 86, (1991) 316–342.
  • [19] Li, K.-C., On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma., J. Amer. Statist. Assoc. 87, (1992) 1025–1039.
  • [20] Lopuhaä, H. P., On the relation between S-estimators and M-estimators of multivariate location and covariance., Ann. Statist. 17, (1989) 1662–1683.
  • [21] Lopuhaä, H. P., Asymptotics of reweighted estimators of multivariate location and scatter., Ann. Statist. 27, (1999) 1638–1665.
  • [22] Li, K.-C. and Duan, N., Regression analysis under link violation., Ann. Statist. 17, (1989) 1009–1052.
  • [23] Prendergast, L. A, Detecting influential observations in Sliced Inverse Regression analysis., Aust. N. Z. J. Stat. 48, (2006) 285–304.
  • [24] Prendergast, L. A., Implications of influence function analysis for sliced inverse regression and sliced average variance estimation., Biometrika. 94, (2007) 585–601.
  • [25] Prendergast, L. A. and Smith, J. A., Sensitivity of principal Hessian direction analysis., Electronic Journal of Statistics 1, (2007) 253–267 (electronic).
  • [26] Rellich, F. Perturbation theory of eigenvalue problems. Gordon and Breach, (1969)
  • [27] Robert, P. and Escoufier, Y. A unifying tool for linear multivariate statistical methods: the RV coefficient., Appl. Statist. 25, (1976) 257–265.
  • [28] Rousseeuw, P. J., Multivariate estimation with high breakdown point, In:, Mathematical Statistics and Applications. Eds: Grossman, W., Pflug, G., Vincze, I. and Wertz, W. Vol. B, Reidel: Dordrecht, pp. 283–297 (1985)
  • [29] Rousseeuw, P. J. and Leroy, A. M., Robust regression and outlier detection. Wiley: New York., (1987)
  • [30] Rousseeuw, P. J. and Yohai, V. J. Robust regression by means of, S-estimators. Robust and Nonlinear Time Series Analysis. Lecture Notes in Statist. 26 Springer: New York (1984).
  • [31] Tanaka, Y. and Castaño-Tostado, E., Quadratic perturbation expansions of certain functions of eigenvalues and eigenvectors and their application to sensitivity analysis in multivariate methods., Commun. Statist.-Theory Meth. 19, (1990) 2943–2965.
  • [32] Yanai, H. Unification of various techniques of multivariate analysis by means of generalized coefficient of determination (G.C.D.)., Behaviour metrics 1, (1974) 45–54.

See also

  • Related item: Bénasséni, J. (2014). Sensitivity of principal component subspaces: A comment on Prendergast’s paper. Electron. J. Statist. 8 927–930.