Brazilian Journal of Probability and Statistics

Manifold matching: Joint optimization of fidelity and commensurability

Carey E. Priebe, David J. Marchette, Zhiliang Ma, and Sancar Adali

Full-text: Open access

Abstract

Fusion and inference from multiple and massive disparate data sources—the requirement for our most challenging data analysis problems and the goal of our most ambitious statistical pattern recognition methodologies—has many and varied aspects which are currently the target of intense research and development. One aspect of the overall challenge is manifold matching—identifying embeddings of multiple disparate data spaces into the same low-dimensional space where joint inference can be pursued. We investigate this manifold matching task from the perspective of jointly optimizing the fidelity of the embeddings and their commensurability with one another, with a specific statistical inference exploitation task in mind. Our results demonstrate when and why our joint optimization methodology is superior to either version of separate optimization. The methodology is illustrated with simulations and an application in document matching.

Article information

Source
Braz. J. Probab. Stat., Volume 27, Number 3 (2013), 377-400.

Dates
First available in Project Euclid: 28 May 2013

Permanent link to this document
https://projecteuclid.org/euclid.bjps/1369746499

Digital Object Identifier
doi:10.1214/12-BJPS188

Mathematical Reviews number (MathSciNet)
MR3064729

Zentralblatt MATH identifier
1298.62102

Keywords
Fusion inference multiple disparate datasets

Citation

Priebe, Carey E.; Marchette, David J.; Ma, Zhiliang; Adali, Sancar. Manifold matching: Joint optimization of fidelity and commensurability. Braz. J. Probab. Stat. 27 (2013), no. 3, 377--400. doi:10.1214/12-BJPS188. https://projecteuclid.org/euclid.bjps/1369746499


Export citation

References

  • Berry, M. W. (2003). Survey of Text Mining I: Clustering, Classification, and Retrieval (No. 1). New York: Springer.
  • Berry, M. W. (2007). Survey of Text Mining II: Clustering, Classification, and Retrieval (No. 2). New York: Springer.
  • Berry, M. W. and Kogan, J. (2010). Text Mining: Applications and Theory. Chichester: Wiley.
  • Bookstein, F. (1991). Morphometric Tools for Landmark Data. Cambridge: Cambridge Univ. Press.
  • Borg, I. and Groenen, P. (2005). Modern Multidimensional Scaling: Theory and Applications. New York: Springer-Verlag.
  • Calderbank, R., Casazza, P. G., Heinecke, A., Kutyniok, G. and Pezeshki, A. (2012). Sparse fusion frames: Existence and construction. Advances in Computational Mathematics 35, 1–31.
  • Commandeur, J. and Heiser, W. (1993). Mathematical derivations in the proximity scaling (PROXSCAL) of symmetric data matrices. Research Report RR-93-04, Dept. Data Theory, Leiden Univ.
  • Cox, T. and Cox, M. (2001). Multidimensional Scaling. Boca Raton, FL: Chapman & Hall.
  • Devroy, L., Gyorfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag.
  • Gower, J. and Dijksterhuis, G. (2004). Procrustes Problems. Oxford: Oxford Univ. Press.
  • Hand, D. J. (2006). Classifier technology and the illusion of progress. Statistical Science 21, 1–34.
  • Hardoon, D. R., Szedmak, S. R. and Shawe-Taylor, J. R. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 2639.
  • Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 321.
  • Jain, A. K., Duin, R. P. W. and Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 4–37.
  • Kendall, D. G. (1989). A survey of the statistical theory of shape. Statistical Science 4, 87–99.
  • Lin, D. and Pantel, P. (2002). Concept discovery from text. In Proc. 19th International Conference on Computational Linguistics 1–7.
  • Ma, Z., Marchette, D. J. and Priebe, C. E. (2012). Fusion and inference from multiple data sources in commensurate space. Statistical Analysis and Data Mining 5, 187–193.
  • Ma, Z. and Priebe, C. E. (2010). Out-of-sample embedding using iterative majorization. Unpublished manuscript.
  • Ma, Z., Cardinal-Stakenas, A., Park, Y., Trosset, M. W. and Priebe, C. E. (2010). Dimensionality reduction on the Cartesian product of embeddings of multiple dissimilarity matrices. Journal of Classification 27, 307–321.
  • Ma, Y., Niyogi, P., Sapiro, G. and Vidal, R. (2011). Dimensionality reduction via subspace and submanifold learning. IEEE Signal Processing Magazine 28, 14–15, 126.
  • Mardia, K. and Dryden, I. L. (1998). Statistical Shape Analysis. Chichester: Wiley.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1980). Multivariate Analysis. Probability and Mathematical Statistics. San Diego: Academic Press.
  • Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 1345–1359.
  • Pantel, P. and Lin, D. (2002). Discovering word senses from text. In Proc. ACM SIGKDD Conference on Knowldedge Discovery and Data Mining 613–619.
  • Pekalska, E. and Duin, R. P. W. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. Machine Perception and Artificial Intelligence. River Edge, NJ: World Scientific.
  • Qiu, L., Zhang, Y. and Li, C.-K. (2005). Unitarily invariant metrics on the Grassmann space. SIAM Journal on Matrix Analysis and Applications 27, 507–531.
  • Torgerson, W. (1952). Multidimensional scaling: I. Theory and method. Psychometrika 21, 401–419.
  • Torgerson, W. (1958). Theory and Methods of Scaling. New York: Wiley.
  • Trosset, M. W. and Priebe, C. E. (2008). The out-of-sample problem for classical multidimensional scaling. Computational Statistics and Data Analysis 52, 4635–4642.
  • Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics and Data Analysis 51, 918–930.