Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets

Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machin...

Full description

Saved in:
Bibliographic Details
Published inBiochimica et biophysica acta. General subjects Vol. 1864; no. 4; p. 129535
Main Authors Aranha, Michelle P., Spooner, Catherine, Demerdash, Omar, Czejdo, Bogdan, Smith, Jeremy C., Mitchell, Julie C.
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.04.2020
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machine classifier (SVMC) models on physicochemical sequence-based and structure-based descriptor sets to predict peptide binding to a well-studied model mouse MHC I allele, H-2Db. Recursive feature elimination and two-way forward feature selection were also performed. Although low on sensitivity compared to the current state-of-the-art algorithms, models based on physicochemical descriptor sets achieve specificity and precision comparable to the most popular sequence-based algorithms. The best-performing model is a hybrid descriptor set containing both sequence-based and structure-based descriptors. Interestingly, close to half of the physicochemical sequence-based descriptors remaining in the hybrid model were properties of the anchor positions, residues 5 and 9 in the peptide sequence. In contrast, residues flanking position 5 make little to no residue-specific contribution to the binding affinity prediction. The results suggest that machine-learned models incorporating both sequence-based descriptors and structural data may provide information on specific physicochemical properties determining binding affinities. •Sequence-based physicochemical descriptor sets were benchmarked for predicting peptide binding to mouse MHC I allele, H-2Db.•Improvement in binding prediction was obtained by combining the best performing sequence-based features with structure-based features.•Features of the residues at anchor sites, 5 and 9 greatly influence peptide binding prediction.•Residues at positions 4 and 6 flanking the anchor residue made little to no contribution to the binding affinity prediction of the model.•The machine learning method described here can be useful in filtering samples with large number of potential binders.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
USDOE
AC05-00OR22725
ISSN:0304-4165
1872-8006
DOI:10.1016/j.bbagen.2020.129535