Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets
Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machin...
Saved in:
Published in | Biochimica et biophysica acta. General subjects Vol. 1864; no. 4; p. 129535 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Netherlands
Elsevier B.V
01.04.2020
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machine classifier (SVMC) models on physicochemical sequence-based and structure-based descriptor sets to predict peptide binding to a well-studied model mouse MHC I allele, H-2Db. Recursive feature elimination and two-way forward feature selection were also performed. Although low on sensitivity compared to the current state-of-the-art algorithms, models based on physicochemical descriptor sets achieve specificity and precision comparable to the most popular sequence-based algorithms. The best-performing model is a hybrid descriptor set containing both sequence-based and structure-based descriptors. Interestingly, close to half of the physicochemical sequence-based descriptors remaining in the hybrid model were properties of the anchor positions, residues 5 and 9 in the peptide sequence. In contrast, residues flanking position 5 make little to no residue-specific contribution to the binding affinity prediction. The results suggest that machine-learned models incorporating both sequence-based descriptors and structural data may provide information on specific physicochemical properties determining binding affinities.
•Sequence-based physicochemical descriptor sets were benchmarked for predicting peptide binding to mouse MHC I allele, H-2Db.•Improvement in binding prediction was obtained by combining the best performing sequence-based features with structure-based features.•Features of the residues at anchor sites, 5 and 9 greatly influence peptide binding prediction.•Residues at positions 4 and 6 flanking the anchor residue made little to no contribution to the binding affinity prediction of the model.•The machine learning method described here can be useful in filtering samples with large number of potential binders. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 USDOE AC05-00OR22725 |
ISSN: | 0304-4165 1872-8006 |
DOI: | 10.1016/j.bbagen.2020.129535 |