## The Annals of Applied Statistics

### Orthogonal simple component analysis: A new, exploratory approach

#### Abstract

Combining principles with pragmatism, a new approach and accompanying algorithm are presented to a longstanding problem in applied statistics: the interpretation of principal components. Following Rousson and Gasser [53 (2004) 539–555], the ultimate goal is not to propose a method that leads automatically to a unique solution, but rather to develop tools for assisting the user in his or her choice of an interpretable solution.

Accordingly, our approach is essentially exploratory. Calling a vector ‘simple’ if it has small integer elements, it poses the open question: What sets of simply interpretable orthogonal axes—if any—are angle-close to the principal components of interest?

its answer being presented in summary form as an automated visual display of the solutions found, ordered in terms of overall measures of simplicity, accuracy and star quality, from which the user may choose. Here, ‘star quality’ refers to striking overall patterns in the sets of axes found, deserving to be especially drawn to the user’s attention precisely because they have emerged from the data, rather than being imposed on it by (implicitly) adopting a model. Indeed, other things being equal, explicit models can be checked by seeing if their fits occur in our exploratory analysis, as we illustrate. Requiring orthogonality, attractive visualization and dimension reduction features of principal component analysis are retained.

Exact implementation of this principled approach is shown to provide an exhaustive set of solutions, but is combinatorially hard. Pragmatically, we provide an efficient, approximate algorithm. Throughout, worked examples show how this new tool adds to the applied statistician’s armoury, effectively combining simplicity, retention of optimality and computational efficiency, while complementing existing methods. Examples are also given where simple structure in the population principal components is recovered using only information from the sample. Further developments are briefly indicated.

#### Article information

Source
Ann. Appl. Stat., Volume 5, Number 1 (2011), 486-522.

Dates
First available in Project Euclid: 21 March 2011

https://projecteuclid.org/euclid.aoas/1300715200

Digital Object Identifier
doi:10.1214/10-AOAS374

Mathematical Reviews number (MathSciNet)
MR2810407

Zentralblatt MATH identifier
1220.62074

#### Citation

Anaya-Izquierdo, Karim; Critchley, Frank; Vines, Karen. Orthogonal simple component analysis: A new, exploratory approach. Ann. Appl. Stat. 5 (2011), no. 1, 486--522. doi:10.1214/10-AOAS374. https://projecteuclid.org/euclid.aoas/1300715200

#### References

• Chipman, H. A. and Gu, H. (2005). Interpretable dimension reduction. J. Appl. Statist. 32 969–987.
• D’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448.
• Fang, K.-T. and Li, R.-Z. (1997). Some methods for generating both an NT-net and the uniform distribution on a Stiefel manifold and their applications. Comput. Statist. Data Anal. 24 29–46.
• Farcomeni, A. (2009). An exact approach to sparse principal component analysis. Comput. Statist. 24 583–604.
• Gervini, D. and Rousson, V. (2004). Criteria for evaluating dimension-reducing components for multivariate data. Amer. Statist. 58 72–76.
• Hausman, R. E. (1982). Constrained multivariate analysis. In Optimization in Statistics. Studies in the Managment Sciences 19 137–151. North-Holland, Amsterdam.
• Jeffers, J. N. R. (1967). Two case studies in the application of principal component analysis. Appl. Statist. 16 225–236.
• Jolliffe, I. T. (2002). Principal Component Analysis. Springer, New York.
• Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
• Kolda, T. G. and O’Leary, D. P. (1998). A semidiscrete matrix decomposition for latent semantic indexing in information retrieval. ACM Trans. Inform. Syst. 16 322–346.
• Lazzeroni, L. and Owen, A. (2002). Plaid models for gene expression data. Statist. Sinica 12 61–86.
• Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401 788–791.
• Mardia, K. V. and Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
• Park, T. (2005). A penalized likelihood approach to rotation of principal components. J. Comput. Graph. Statist. 14 867–888.
• Rousson, V. and Gasser, T. (2003). Some case studies of simple component analysis. Unpublished manuscript.
• Rousson, V. and Gasser, T. (2004). Simple component analysis. Appl. Statist. 53 539–555.
• Sjöstrand, K., Stegmann, M. B. and Larsen, R. (2006). Sparse principal component analysis in medical shape modeling. In International Society for Optical Engineering (SPIE) 1579–1590.
• Sun, L. (2006). Simple principal components. Ph.D. thesis, Open Univ.
• Thompson, M. O., Vines, S. K. and Harrington, K. (1999). Assessment of blood volume flow in the uterine artery: The influence of arterial distensibility and waveform abnormality. Ultrasound in Obstetrics and Gynecology 14 71.
• Trendafilov, N. T. and Jolliffe, I. T. (2007). DALASS: Variable selection in discriminant Analysis via the LASSO. Comput. Statist. Data Anal. 51 3718–3736.
• Vines, S. K. (2000). Simple principal components. Appl. Statist. 49 441–451.
• Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.