Designing meaningful continuous representations of T cell receptor sequences with deep generative models

T Cell Receptor (TCR) antigen binding underlies a key mechanism of the adaptive immune response yet the vast diversity of TCRs and the complexity of protein interactions limits our ability to build useful low dimensional representations of TCRs. To address the current limitations in TCR analysis we...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 15; no. 1; pp. 4271 - 14
Main Authors Leary, Allen Y., Scott, Darius, Gupta, Namita T., Waite, Janelle C., Skokos, Dimitris, Atwal, Gurinder S., Hawkins, Peter G.
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 20.05.2024
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:T Cell Receptor (TCR) antigen binding underlies a key mechanism of the adaptive immune response yet the vast diversity of TCRs and the complexity of protein interactions limits our ability to build useful low dimensional representations of TCRs. To address the current limitations in TCR analysis we develop a capacity-controlled disentangling variational autoencoder trained using a dataset of approximately 100 million TCR sequences, that we name TCR-VALID. We design TCR-VALID such that the model representations are low-dimensional, continuous, disentangled, and sufficiently informative to provide high-quality TCR sequence de novo generation. We thoroughly quantify these properties of the representations, providing a framework for future protein representation learning in low dimensions. The continuity of TCR-VALID representations allows fast and accurate TCR clustering and is benchmarked against other state-of-the-art TCR clustering tools and pre-trained language models. Relating T cell receptor (TCR) sequencing to antigen specificity is a challenge especially when TCR specificity is unclear. Here the authors use a low dimensional generative approach to model TCR sequence similarity and to associate TCR sequences with the same specificity.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-024-48198-0