Exploring Machine Learning Techniques to Improve Peptide Identification

Proteotypic peptides are the peptides in protein sequences that can be confidently observed by mass-spectrometry based proteomics. In recent years, there has been an increased effort to use proteotypic peptide prediction to improve the accuracy of peptide identification. These investigations compile...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE) pp. 66 - 71
Main Authors Kirmani, Fawad, Lane, Bryan Jeremy, Rose, John R.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Proteotypic peptides are the peptides in protein sequences that can be confidently observed by mass-spectrometry based proteomics. In recent years, there has been an increased effort to use proteotypic peptide prediction to improve the accuracy of peptide identification. These investigations compile various physicochemical peptide features to identify whether peptides are proteotypic. Here we describe our method for the selection, reduction and evaluation of physicochemical features for proteotypic peptide prediction. We performed feature selection on a published set of features and identified six features as the most significant. To highlight the effectiveness of our reduced feature set, we trained three machine learning algorithms (support vector machines, random forests, and XGBoost) as proteotypic peptide identifiers. Importantly, for larger data sets, the random forests and XGBoost algorithms trained faster than the support vector machine, as solving the support vector machine objective function requires quadratic programming. Our three classifiers had similar if not better prediction accuracy when compared to other proteotypic peptide predictors on the same data sets.
ISSN:2471-7819
DOI:10.1109/BIBE.2019.00021