Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets

Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machin...

Full description

Saved in:

Bibliographic Details
Published in	Biochimica et biophysica acta. General subjects Vol. 1864; no. 4; p. 129535
Main Authors	Aranha, Michelle P., Spooner, Catherine, Demerdash, Omar, Czejdo, Bogdan, Smith, Jeremy C., Mitchell, Julie C.
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 01.04.2020 Elsevier
Subjects	BASIC BIOLOGICAL SCIENCES Binding affinity Machine learning MHC-peptide Machine learning Binding affinity MHC-peptide
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machine classifier (SVMC) models on physicochemical sequence-based and structure-based descriptor sets to predict peptide binding to a well-studied model mouse MHC I allele, H-2Db. Recursive feature elimination and two-way forward feature selection were also performed. Although low on sensitivity compared to the current state-of-the-art algorithms, models based on physicochemical descriptor sets achieve specificity and precision comparable to the most popular sequence-based algorithms. The best-performing model is a hybrid descriptor set containing both sequence-based and structure-based descriptors. Interestingly, close to half of the physicochemical sequence-based descriptors remaining in the hybrid model were properties of the anchor positions, residues 5 and 9 in the peptide sequence. In contrast, residues flanking position 5 make little to no residue-specific contribution to the binding affinity prediction. The results suggest that machine-learned models incorporating both sequence-based descriptors and structural data may provide information on specific physicochemical properties determining binding affinities. •Sequence-based physicochemical descriptor sets were benchmarked for predicting peptide binding to mouse MHC I allele, H-2Db.•Improvement in binding prediction was obtained by combining the best performing sequence-based features with structure-based features.•Features of the residues at anchor sites, 5 and 9 greatly influence peptide binding prediction.•Residues at positions 4 and 6 flanking the anchor residue made little to no contribution to the binding affinity prediction of the model.•The machine learning method described here can be useful in filtering samples with large number of potential binders.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 USDOE AC05-00OR22725
ISSN:	0304-4165 1872-8006
DOI:	10.1016/j.bbagen.2020.129535