A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics

Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spect...

Full description

Saved in:

Bibliographic Details
Published in	Journal of proteome research Vol. 21; no. 6; pp. 1566 - 1574
Main Authors	Luo, Xiyang, Bittremieux, Wout, Griss, Johannes, Deutsch, Eric W., Sachsenberg, Timo, Levitsky, Lev I., Ivanov, Mark V., Bubis, Julia A., Gabriels, Ralf, Webel, Henry, Sanchez, Aniel, Bai, Mingze, Käll, Lukas, Perez-Riverol, Yasset
Format	Journal Article
Language	English
Published	United States American Chemical Society 03.06.2022
Subjects	Annan naturvetenskap benchmark big data clustering consensus spectra mass spectrometry Natural Sciences Naturvetenskap Other Natural Sciences Other Natural Sciences not elsewhere specified pride database ProteomeXchange spectral libraries Technical Note Övrig annan naturvetenskap ProteomeXchange consensus spectra pride database mass spectrometry clustering big data spectral libraries benchmark
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1535-3893 1535-3907 1535-3907
DOI:	10.1021/acs.jproteome.2c00069