Exploring Machine Learning Techniques to Improve Peptide Identification

Proteotypic peptides are the peptides in protein sequences that can be confidently observed by mass-spectrometry based proteomics. In recent years, there has been an increased effort to use proteotypic peptide prediction to improve the accuracy of peptide identification. These investigations compile...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE) pp. 66 - 71
Main Authors	Kirmani, Fawad, Lane, Bryan Jeremy, Rose, John R.
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2019
Subjects	amino acid usage Amino acids Data models feature selection machine learning Machine learning algorithms peptide identification Peptides Proteins proteomics Support vector machines
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Proteotypic peptides are the peptides in protein sequences that can be confidently observed by mass-spectrometry based proteomics. In recent years, there has been an increased effort to use proteotypic peptide prediction to improve the accuracy of peptide identification. These investigations compile various physicochemical peptide features to identify whether peptides are proteotypic. Here we describe our method for the selection, reduction and evaluation of physicochemical features for proteotypic peptide prediction. We performed feature selection on a published set of features and identified six features as the most significant. To highlight the effectiveness of our reduced feature set, we trained three machine learning algorithms (support vector machines, random forests, and XGBoost) as proteotypic peptide identifiers. Importantly, for larger data sets, the random forests and XGBoost algorithms trained faster than the support vector machine, as solving the support vector machine objective function requires quadratic programming. Our three classifiers had similar if not better prediction accuracy when compared to other proteotypic peptide predictors on the same data sets.
ISSN:	2471-7819
DOI:	10.1109/BIBE.2019.00021