The Annals of Applied Statistics

The duality diagram in data analysis: Examples of modern applications

Omar De la Cruz and Susan Holmes

Full-text: Open access


Today’s data-heavy research environment requires the integration of different sources of information into structured data sets that can not be analyzed as simple matrices. We introduce an old technique, known in the European data analyses circles as the Duality Diagram Approach, put to new uses through the use of a variety of metrics and ways of combining different diagrams together. This issue of the Annals of Applied Statistics contains contemporary examples of how this approach provides solutions to hard problems in data integration. We present here the genesis of the technique and how it can be seen as a precursor of the modern kernel based approaches.

Article information

Ann. Appl. Stat., Volume 5, Number 4 (2011), 2266-2277.

First available in Project Euclid: 20 December 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Duality gPCA generalized SVD kernel methods RV coefficient


De la Cruz, Omar; Holmes, Susan. The duality diagram in data analysis: Examples of modern applications. Ann. Appl. Stat. 5 (2011), no. 4, 2266--2277. doi:10.1214/10-AOAS408.

Export citation


  • Baty, F., Facompré, M., Wiegand, J., Schwager, J. and Brutsche, M. (2006). Analysis with respect to instrumental variables for the exploration of microarray data structures. BMC Bioinformatics 7 422.
  • Baty, F., Jaeger, D., Preiswerk, F., Schumacher, M. and Brutsche, M. (2008). Stability of gene contributions and identification of outliers in multivariate analysis of microarray data. BMC Bioinformatics 9 289.
  • Benzécri, J.-P. (1973). L’analyse des données: Leçons sur l’analyse factorielle et la reconnaissance des formes, et travaux du Laboratoire de statistique de l’Université de Paris VI. Dunod, Paris.
  • Cailliez, F. and Pages, J. P. (1976). Introduction à l’analyse des données. SMASH, Paris.
  • Chessel, D., Dufour, A. and Thioulouse, J. (2004). The ade4 package, I: One-table methods. R News 4 5–10.
  • Culhane, A., Perriere, G., Considine, E., Cotter, T. and Higgins, D. (2002). Between-group analysis of microarray data. Bioinformatics 18 1600.
  • Culhane, A., Perrière, G. and Higgins, D. (2003). Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4 59.
  • Dray, S. and Dufour, A. (2007). The ade4 package: Implementing the duality diagram for ecologists. J. Statist. Softw. 22 6.
  • Dray, S., Dufour, A. and Chessel, D. (2007). The ade4 package—II: Two-table and k-table methods. R News 7(2) 47–52.
  • Dray, S. and Jombart, T. (2011). Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis. Ann. Appl. Statist. 5 2278–2299.
  • Escoufier, Y. (1980). L’analyse conjointe de plusieurs matrices de données. In Biométrie et Temps (E. Jolivet, ed.) 59–76. Societe Francaise de Biométrie, Paris.
  • Escoufier, Y. (2006). Operator related to a data matrix: A survey. In COMPSTAT 2006—Proceedings in Computational Statistics 285–297. Physica, Heidelberg.
  • Fagan, A., Culhane, A. and Higgins, D. (2007). A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7 2162–2171.
  • Gifi, A. (1990). Nonlinear Multivariate Analysis. Wiley, Chichester.
  • Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD.
  • Holmes, S. (2006). Multivariate data analysis: The French way. In Probability and Statistics: Essays in Honor of David A. Freedman (D. Nolan and T. Speed, eds.) 219–233. IMS, Beachwood, OH.
  • Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5 299–314.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
  • Purdom, E. (2011). Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree. Ann. Appl. Statist. 5 2326–2358.
  • Rao, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā A 26 329–359.
  • Schölkopf, B., Smola, A. and Muller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 1299–1319.
  • Schölkopf, B., Tsuda, K. and Vert, J.-P. (2004). Kernel Methods in Computational Biology. MIT Press, Cambridge, MA.
  • Shinkareva, S., Mason, R., Malave, V., Wang, W., Mitchell, T. and Just, M. (2008). Using fMRI brain activation to identify cognitive states associated with perception of tools and dwellings. PLoS One 3 e1394.
  • Thioulouse, J. (2011). Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods. Ann. Appl. Statist. 5 2300–2325.