Designing meaningful continuous representations of T cell receptor sequences with deep generative models
T Cell Receptor (TCR) antigen binding underlies a key mechanism of the adaptive immune response yet the vast diversity of TCRs and the complexity of protein interactions limits our ability to build useful low dimensional representations of TCRs. To address the current limitations in TCR analysis we...
Saved in:
Published in | Nature communications Vol. 15; no. 1; pp. 4271 - 14 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
London
Nature Publishing Group UK
20.05.2024
Nature Publishing Group Nature Portfolio |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | T Cell Receptor (TCR) antigen binding underlies a key mechanism of the adaptive immune response yet the vast diversity of TCRs and the complexity of protein interactions limits our ability to build useful low dimensional representations of TCRs. To address the current limitations in TCR analysis we develop a capacity-controlled disentangling variational autoencoder trained using a dataset of approximately 100 million TCR sequences, that we name TCR-VALID. We design TCR-VALID such that the model representations are low-dimensional, continuous, disentangled, and sufficiently informative to provide high-quality TCR sequence de novo generation. We thoroughly quantify these properties of the representations, providing a framework for future protein representation learning in low dimensions. The continuity of TCR-VALID representations allows fast and accurate TCR clustering and is benchmarked against other state-of-the-art TCR clustering tools and pre-trained language models.
Relating T cell receptor (TCR) sequencing to antigen specificity is a challenge especially when TCR specificity is unclear. Here the authors use a low dimensional generative approach to model TCR sequence similarity and to associate TCR sequences with the same specificity. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2041-1723 2041-1723 |
DOI: | 10.1038/s41467-024-48198-0 |