Electronic Journal of Statistics

Analysis of proteomics data: An improved peak alignment approach

Ian Zhang and Xueli Liu

Full-text: Open access


Mass spectrometry (MS) data are becoming common in recent years. Prior to other statistical inferential procedures, alignment of spectra may be needed to ensure that intensities of the same protein/peptide are accurately located/identified. However, the enormous number of peaks poses challenge in handling such data. Direct applications of available curve alignment methods often do not produce satisfactory results. In this work, we propose an Automated Pairwise Piecewise Landmark Registration (APPLR) method for aligning MS data. For a pair of spectra, the most prominent peaks are given the priority to be aligned first. A weighted Gaussian kernel based similarity score is used to test warp these top peaks and spectra are then aligned according to the best match. The algorithm is implemented in an iterative way until all spectra are aligned. We illustrated the new method and two other curve alignment methods to the unlabeled total ion count data.

Article information

Electron. J. Statist., Volume 8, Number 2 (2014), 1748-1755.

First available in Project Euclid: 29 October 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Curve alignment functional data landmark registration pairwise spectrometry data time warping


Zhang, Ian; Liu, Xueli. Analysis of proteomics data: An improved peak alignment approach. Electron. J. Statist. 8 (2014), no. 2, 1748--1755. doi:10.1214/14-EJS900E. https://projecteuclid.org/euclid.ejs/1414588158

Export citation


  • [1] Baggerly, K., Morris, J., Wang, J., Gold, D., Xiao, L., and Coombes, K. (2003). A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples., Proteomics 3, 1667–1672.
  • [2] James, G. (2007). Curve alignment by moments., Ann. of Appl. Stat. 1, 480–501.
  • [3] Koch, I., Hoffman, P., and Marron, J. S. (2014). Proteomics profiles from mass spectrometry., Electronic Journal of Statistics 8, 1703–1714, Special Section on Statistics of Time Warpings and Phase Variations.
  • [4] Leng, X. and Müller, H. (2006). Time ordering of gene co-expression., Biostatistics 7, 569–584.
  • [5] Li, J., Zhang, Z., Rosenzweig, J., Wang, Y., and Chan, D. (2002). Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer., Clin. Chem. 48, 1296–1304.
  • [6] Petricoin, E. and Liotta, L. (2003). Mass spectrometry-based diagnostics: The upcoming revolution in disease detection., Clin. Chem. 49, 533–534.
  • [7] Ramsay, J. and Silverman, B. (2002)., Functional Data Analysis. New York:Springer.
  • [8] Tang, R. and Müller, H. (2008). Pairwise curve synchronization for high-dimensional data., Biometrika 95, 875–889.
  • [9] Tang, R. and Müller, H. (2009). Time-synchronized clustering of gene expression trajectories., Biostatistics 10, 32–45.
  • [10] Wong, J. et al. (2005). Specalign-processing and alignment of mass spectra datasets., Bioinformatics 21, 2088–2090.
  • [11] Yasui, Y. et al. (2003). A data-analytic strategy for protein biomarker discovery: Profiling of high-dimensional proteomic data for cancer detection., Biostatistics 4, 449–463.
  • [12] Yu, W., Wu, B., Lin, N., Stone, K., Williams, K., and Zhao, H. (2005). Detecting and aligning peaks in mass spectrometry data with applications to MALDI., Comp. Biol. and Chem. 30, 27–38.

See also

  • Related item: Koch, I., Hoffmann, P., Marron, J. S. (2014). Proteomics profiles from mass spectrometry. Electron. J. Statist. 8(2) 1703–1713.