Brazilian Journal of Probability and Statistics

Searching for the core variables in principal components analysis

Yanina Gimenez and Guido Giussani

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

In this article, we introduce a procedure for selecting variables in principal components analysis. It is developed to identify a small subset of the original variables that best explain the principal components through nonparametric relationships. There are usually some noisy uninformative variables in a dataset, and some variables that are strongly related to one another because of their general dependence. The procedure is designed to be used following the satisfactory initial principal components analysis with all variables, and its aim is to help to interpret the underlying structures. We analyze the asymptotic behavior of the method and provide some examples.

Article information

Source
Braz. J. Probab. Stat., Volume 32, Number 4 (2018), 730-754.

Dates
Received: March 2015
Accepted: April 2017
First available in Project Euclid: 17 August 2018

Permanent link to this document
https://projecteuclid.org/euclid.bjps/1534492899

Digital Object Identifier
doi:10.1214/17-BJPS361

Mathematical Reviews number (MathSciNet)
MR3845027

Zentralblatt MATH identifier
06979598

Keywords
Informative variables multivariate analysis principal components selection of variables

Citation

Gimenez, Yanina; Giussani, Guido. Searching for the core variables in principal components analysis. Braz. J. Probab. Stat. 32 (2018), no. 4, 730--754. doi:10.1214/17-BJPS361. https://projecteuclid.org/euclid.bjps/1534492899


Export citation

References

  • Biau, G., Fischer, A., Guedj, B. and Malley, J. D. (2013). COBRA: A collective regression strategy. Available at arXiv:1303.2236.
  • Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. Journal of Multivariate Analysis 12, 136–154.
  • Fraiman, R., Justel, A. and Svarc, M. (2008). Selection of variables for cluster analysis and classification rules. Journal of the American Statistical Association 103, 1294–1303.
  • Frank, A. and Asuncion, A. (2010). UCI machine learning repository. School of Information and Computer Science, University of California, Irvine, CA. Available at http://archive.ics.uci.edu/ml.
  • Hansen, B. E. (2008). Uniform convergence rates for kernel estimation with dependent data. Econometric Theory 24, 726–748.
  • He, X. and Shi, P. (1996). Bivariate tensor-product B-splines in partly linear models. Journal of Multivariate Analysis 58, 162–181.
  • Jolliffe, I. (1995). Rotation of principal components: Choise of normalization constraints. Journal of Applied Statistics 22, 29–35.
  • Jolliffe, I. (2002). Principal Components Analysis, 2nd ed. New York: Springer.
  • Jolliffe, I., Trendafilov, N. and Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics 12, 531–547.
  • Li, R. and Gong, G. (2008). K-nearest-neighbour nonparametric estimation of regression functions in the presence of irrelevant variables. Econometrics Journal 11, 396–408.
  • Luss, R. and d’Aspremont, A. (2010). Clustering and feature selection using sparse principal component analysis. Optimization and Engineering 11, 145–157.
  • Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. London: Wiley.
  • McCabe, G. P. (1984). Principal variables. Technometrics 26, 137–144.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B 58, 267–288.
  • Vines, S. (2000). Simple principal components. Journal of the Royal Statistical Society, Series C 49, 441–451.
  • Witten, D. M. and Tibshirani, R. (2008). Testing significance of features by lassoed principal components. Annals of Applied Statistics 2, 986–1012.
  • Zou, H. and Hastie, T. (2005). Regularizations and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301–320.
  • Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 265–286.