Improved pan-specific prediction of MHC class I peptide binding using a novel receptor clustering data partitioning strategy

Pan‐specific prediction of receptor–ligand interaction is conventionally done using machine‐learning methods that integrates information about both receptor and ligand primary sequences. To achieve optimal performance using machine learning, dealing with overfitting and data redundancy is critical....

Full description

Saved in:

Bibliographic Details
Published in	HLA Vol. 88; no. 6; pp. 287 - 292
Main Authors	Mattsson, A. H., Kringelum, J. V., Garde, C., Nielsen, M.
Format	Journal Article
Language	English
Published	Oxford, UK Blackwell Publishing Ltd 01.12.2016
Subjects	Alleles Animals artificial neural networks Binding Sites clustering Epitopes - chemistry Epitopes - immunology Gene Expression Gorilla gorilla Histocompatibility Antigens Class I - chemistry Histocompatibility Antigens Class I - genetics Histocompatibility Antigens Class I - immunology Humans Ligands Macaca Machine Learning MHC binding specificity MHC class I Mice Oligopeptides - chemistry Oligopeptides - genetics Oligopeptides - immunology Pan troglodytes peptide-MHC binding Protein Binding Protein Interaction Domains and Motifs Software Structural Homology, Protein T-cell epitope MHC class I artificial neural networks clustering peptide-MHC binding MHC binding specificity T-cell epitope
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Pan‐specific prediction of receptor–ligand interaction is conventionally done using machine‐learning methods that integrates information about both receptor and ligand primary sequences. To achieve optimal performance using machine learning, dealing with overfitting and data redundancy is critical. Most often so‐called ligand clustering methods have been used to deal with these issues in the context of pan‐specific receptor–ligand predictions, and the MHC system the approach has proven highly effective for extrapolating information from a limited set of receptors with well characterized binding motifs, to others with no or very limited experimental characterization. The success of this approach has however proven to depend strongly on the similarity of the query molecule to the molecules with characterized specificity using in the machine‐learning process. Here, we outline an alternative strategy with the aim of altering this and construct data sets optimal for training of pan‐specific receptor–ligand predictions focusing on receptor similarity rather than ligand similarity. We show that this receptor clustering method consistently in benchmarks covering affinity predictions, MHC ligand and MHC epitope identification perform better than the conventional ligand clustering method on the alleles with remote similarity to the training set.
Bibliography:	Evaxion Biotech ark:/67375/WNG-LM49DKLS-8 Data S1. Training and evaluation data overview.Data S2. RC (distance matrix).Data S3. External evaluation data.Data S4. F-rank evaluation data overview.Data S5. SYFPEITHI ligands.Data S6. IEDB ligands.Data S7. IEDB T-cell epitopes. istex:83707AC226F61464DB50E81EC745326E16420B79 ArticleID:TAN12911
ISSN:	2059-2302 2059-2310
DOI:	10.1111/tan.12911