The Annals of Applied Statistics

Matching markers and unlabeled configurations in protein gels

Kanti V. Mardia, Emma M. Petty, and Charles C. Taylor

Full-text: Open access

Abstract

Unlabeled shape analysis is a rapidly emerging and challenging area of statistics. This has been driven by various novel applications in bioinformatics. We consider here the situation where two configurations are matched under various constraints, namely, the configurations have a subset of manually located “markers” with high probability of matching each other while a larger subset consists of unlabeled points. We consider a plausible model and give an implementation using the EM algorithm. The work is motivated by a real experiment of gels for renal cancer and our approach allows for the possibility of missing and misallocated markers. The methodology is successfully used to automatically locate and remove a grossly misallocated marker within the given data set.

Article information

Source
Ann. Appl. Stat., Volume 6, Number 3 (2012), 853-869.

Dates
First available in Project Euclid: 31 August 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1346418565

Digital Object Identifier
doi:10.1214/12-AOAS544

Mathematical Reviews number (MathSciNet)
MR3012512

Zentralblatt MATH identifier
06096513

Keywords
Electrophoresis shape Western Blots

Citation

Mardia, Kanti V.; Petty, Emma M.; Taylor, Charles C. Matching markers and unlabeled configurations in protein gels. Ann. Appl. Stat. 6 (2012), no. 3, 853--869. doi:10.1214/12-AOAS544. https://projecteuclid.org/euclid.aoas/1346418565


Export citation

References

  • Banks, R. E., Dunn, M. J., Hochstrasser, D. F., Sanchez, J. C., Blackstock, W., Pappin, D. J. and Selby, P. J. (2000). Proteomics: New perspectives, new biomedical opportunities. Lancet 356 1749–1756.
  • Berkelaar, M. (2008). Interface to lp_solve v. 5.5 to solve linear/integer programs, R package.
  • Besl, P. J. and McKay, N. D. (1992). A method for registration of 3-D shapes. IEE Trans. PAMI 14 239–256.
  • Chen, P. (2011). A novel kernel correlation model with the correspondence estimation. J. Math. Imaging Vision 39 100–120.
  • Chui, H. and Rangarajan, A. (2003). A new point matching algorithm for non-rigid registration. Computer Vision and Understanding 89 114–141.
  • Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011). Bayesian matching of unlabeled marked point sets using random feilds, with an application to molecular alignment. Ann. Appl. Stat. 5 2603–2629.
  • Dryden, I. L., Hirst, J. D. and Melville, J. L. (2007). Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics. Biometrics 63 237–251, 315.
  • Dryden, I. L. and Mardia, K. V. (1998). Statistical Shape Analysis. Wiley, Chichester.
  • Dryden, I. L. and Walker, G. (1999). Highly resistance regression and object matching. Biometrics 55 820–825.
  • Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24 381–395.
  • Forgber, M., Gellrich, S., Sharav, T., Sterry, W. and Walden, P. (2009). Proteome-based analysis of serologically defined tumor-associated antigens in cutaneous lymphona. PloS ONE 4 e8376.
  • Glaunes, J., Trouvé, A. and Younes, L. (2004). Diffeomorphic matching of mistributions: A new approach for unlabelled point-sets and sub-manifolds matching. CVPR 2 712–718.
  • Green, P. J. and Mardia, K. V. (2006). Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93 235–254.
  • Green, P. J., Mardia, K. V., Nyirongo, V. B. and Ruffieux, Y. (2010). Bayesian modelling for matching and alignment of biomolecules. In The Oxford Handbook of Applied Bayesian Analysis 27–50. Oxford Univ. Press, Oxford.
  • Kent, J. T., Mardia, K. V. and Taylor, C. C. (2010a). Matching unlabelled configurations and protein bioinformatics. Research Report STAT10-01. Univ. Leeds, Leeds, UK.
  • Kent, J. T., Mardia, K. V. and Taylor, C. C. (2010b). An EM interpretation of the Softassign algorithm for alignment problems. In LASR10—High-throughput sequencing, proteins and statistics (A. Gusnanto, K. V. Mardia, C. J. Fallaize and J. Voss, eds.) 29–32. Dept. Statistics, Univ. Leeds, Leeds, UK.
  • Mardia, K. V., Petty, E. M. and Taylor, C. C. (2012). Supplement to “Matching markers and unlabeled configurations in protein gels.” DOI:10.1214/12-AOAS544SUPP.
  • McLachlan, G. J. and Krishnan, T. (2008). The EM Algorithm and Extensions, 2nd ed. Wiley, Hoboken, NJ.
  • Murphy-Chutorian, E. and Trivedi, M. M. (2008). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 607–626.
  • Petty, E. M. (2009). Shape analysis in bioinformatics. Ph.D. thesis, Univ. Leeds, Leeds, UK.
  • Rangarajan, A., Chui, H. and Bookstein, F. L. (1997). The Softassign Procrustes matching algorithm. In Information Processing in Medical Imaging 15th International Conference, IPMI’97 Poultney 29–42. Springer, New York.
  • Rohr, K., Cathier, P. and Wörz, S. (2004). Elastic registration of electrophoresis images using intensity information and point landmarks. Pattern Recognition 37 1035–1048.
  • Taylor, C. C., Mardia, K. V. and Kent, J. T. (2003). Matching unlabelled configurations using the EM algorithm. In LASR Proceedings: Stochastic Geometry, Biological Structure and Images (R. G. Aykroyd, K. V. Mardia and M. J. Langdon, eds.) 19–21. Dept. Statistics, Univ. Leeds, Leeds, UK.
  • Tsin, Y. and Kanade, T. (2004). A correlation-based approach to robust point set registration. In Computer Vision—ECCV. Lecture Notes in Comput. Sci. 3023 558–569. Springer, Berlin.
  • Walker, G. (2000). Robust, non-parametric and automatic methods for matching spatial point patterns. Ph.D. thesis, Univ. Leeds, Leeds, UK.
  • Zvelebil, M. and Baum, J. O. (2007). Understanding Bioinformatics. Garland Science, New York.

Supplemental materials

  • Supplementary material: Western Blot data. The supplementary data contains a zipped file which includes information taken from 28 Western Blots. This represents 8 subjects (four controls and four patients) treated with two possible treatments. A replicate image is also obtained for each subject-treatment combination, though some replicates are missing. Further details are included in the associated README file.