Exploring Machine Learning Techniques to Improve Peptide Identification
Proteotypic peptides are the peptides in protein sequences that can be confidently observed by mass-spectrometry based proteomics. In recent years, there has been an increased effort to use proteotypic peptide prediction to improve the accuracy of peptide identification. These investigations compile...
Saved in:
Published in | 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE) pp. 66 - 71 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.10.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Proteotypic peptides are the peptides in protein sequences that can be confidently observed by mass-spectrometry based proteomics. In recent years, there has been an increased effort to use proteotypic peptide prediction to improve the accuracy of peptide identification. These investigations compile various physicochemical peptide features to identify whether peptides are proteotypic. Here we describe our method for the selection, reduction and evaluation of physicochemical features for proteotypic peptide prediction. We performed feature selection on a published set of features and identified six features as the most significant. To highlight the effectiveness of our reduced feature set, we trained three machine learning algorithms (support vector machines, random forests, and XGBoost) as proteotypic peptide identifiers. Importantly, for larger data sets, the random forests and XGBoost algorithms trained faster than the support vector machine, as solving the support vector machine objective function requires quadratic programming. Our three classifiers had similar if not better prediction accuracy when compared to other proteotypic peptide predictors on the same data sets. |
---|---|
ISSN: | 2471-7819 |
DOI: | 10.1109/BIBE.2019.00021 |