On TCR binding predictors failing to generalize to unseen peptides

Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how stat...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in immunology Vol. 13; p. 1014256
Main Authors	Grazioli, Filippo, Mösch, Anja, Machart, Pierre, Li, Kai, Alqassem, Israa, O'Donnell, Timothy J, Min, Martin Renqiang
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 21.10.2022
Subjects	binding prediction Immunology interaction prediction machine learning MHC peptide Peptides - metabolism Protein Binding Receptors, Antigen, T-Cell - metabolism tcr peptide tcr binding prediction MHC machine learning TCR - T cell receptor interaction prediction
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples . We propose the , a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models' test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the dataset. We then conclude that robust prediction of TCR recognition is still far for being solved.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Matthew Call, The University of Melbourne, Australia This article was submitted to T Cell Biology, a section of the journal Frontiers in Immunology Reviewed by: Pieter Meysman, University of Antwerp, Belgium
ISSN:	1664-3224 1664-3224
DOI:	10.3389/fimmu.2022.1014256