Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data

The in silico prediction of the best-observable “proteotypic” peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-...

Full description

Saved in:

Bibliographic Details
Published in	Journal of proteomics Vol. 108; pp. 269 - 283
Main Authors	Qeli, Ermir, Omasits, Ulrich, Goetze, Sandra, Stekhoven, Daniel J., Frey, Juerg E., Basler, Konrad, Wollscheid, Bernd, Brunner, Erich, Ahrens, Christian H.
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 28.08.2014
Subjects	Algorithms Animals Bacterial Proteins - genetics Bacterial Proteins - metabolism Bartonella henselae - genetics Bartonella henselae - metabolism Databases, Protein Drosophila melanogaster Drosophila Proteins - genetics Drosophila Proteins - metabolism Leptospira interrogans - genetics Leptospira interrogans - metabolism Machine learning Peptide detectability Peptides - genetics Peptides - metabolism Proteotypic peptides Rank prediction algorithms Saccharomyces cerevisiae - genetics Saccharomyces cerevisiae - metabolism Saccharomyces cerevisiae Proteins Sequence Analysis, Protein - methods SRM Targeted proteomics Proteotypic peptides Rank prediction algorithms Targeted proteomics Peptide detectability Machine learning SRM
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The in silico prediction of the best-observable “proteotypic” peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-observed proteins for any organism, with a significant impact for clinical proteomics and systems biology studies. Current prediction algorithms rely on physicochemical parameters in combination with positive and negative training sets to identify those peptide properties that most profoundly affect their general detectability. Here we present PeptideRank, an approach that uses learning to rank algorithm for peptide detectability prediction from shotgun proteomics data, and that eliminates the need to select a negative dataset for the training step. A large number of different peptide properties are used to train ranking models in order to predict a ranking of the best-observable peptides within a protein. Empirical evaluation with rank accuracy metrics showed that PeptideRank complements existing prediction algorithms. Our results indicate that the best performance is achieved when it is trained on organism-specific shotgun proteomics data, and that PeptideRank is most accurate for short to medium-sized and abundant proteins, without any loss in prediction accuracy for the important class of membrane proteins. Targeted proteomics approaches have been gaining a lot of momentum and hold immense potential for systems biology studies and clinical proteomics. However, since only very few complete proteomes have been reported to date, for a considerable fraction of a proteome there is no experimental proteomics evidence that would allow to guide the selection of the best-suited proteotypic peptides (PTPs), i.e. peptides that are specific to a given proteoform and that are repeatedly observed in a mass spectrometer. We describe a novel, rank-based approach for the prediction of the best-suited PTPs for targeted proteomics applications. By building on methods developed in the field of information retrieval (e.g. web search engines like Google's PageRank), we circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the experimentalist´s need for selecting e.g. the 5 most promising peptides for targeting a protein of interest. This approach allows to predict PTPs for not yet observed proteins or for organisms without prior experimental proteomics data such as many non-model organisms. [Display omitted] •Novel solution to predict proteotypic peptides (PTPs) to target unobserved proteins.•Our rank-based approach complements existing PTP prediction algorithms.•PeptideRank overcomes difficult step to select a positive and negative training set.•Organism specific prior experimental data improves accuracy of PTP prediction.•We evaluated the influence of several parameters on the correct ranking of peptides.
ISSN:	1874-3919 1876-7737
DOI:	10.1016/j.jprot.2014.05.011