TCRCluster: a novel approach to T-cell receptor latent featurization and clustering using contrastive learning-guided two-stage variational autoencoders
T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream ap...
Saved in:
Published in | NAR genomics and bioinformatics Vol. 7; no. 2; p. lqaf065 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
01.06.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2631-9268 2631-9268 |
DOI | 10.1093/nargab/lqaf065 |
Cover
Loading…
Abstract | T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α–β chain data, incorporating all six complementarity-determining regions. A semi-supervised ‘two-stage VAE’ framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis. |
---|---|
AbstractList | T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α-β chain data, incorporating all six complementarity-determining regions. A semi-supervised 'two-stage VAE' framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using
-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis. T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α–β chain data, incorporating all six complementarity-determining regions. A semi-supervised ‘two-stage VAE’ framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K -means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis. Graphical Abstract T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α–β chain data, incorporating all six complementarity-determining regions. A semi-supervised ‘two-stage VAE’ framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis. T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α-β chain data, incorporating all six complementarity-determining regions. A semi-supervised 'two-stage VAE' framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis.T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α-β chain data, incorporating all six complementarity-determining regions. A semi-supervised 'two-stage VAE' framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis. |
Author | Wan, Yat-Tsai Richie Nielsen, Morten |
Author_xml | – sequence: 1 givenname: Yat-Tsai Richie orcidid: 0000-0003-0814-0289 surname: Wan fullname: Wan, Yat-Tsai Richie – sequence: 2 givenname: Morten orcidid: 0000-0001-7885-4311 surname: Nielsen fullname: Nielsen, Morten |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40432791$$D View this record in MEDLINE/PubMed |
BookMark | eNpVkUtrGzEURkVJaNI02y6Llt1Mote8uinFpA8IFIL34lq6M1GRpYmkcWh_SX5ux7Ub0o0krg7nE_rekJMQAxLyjrMrznp5HSCNsLn2DzCwpn5FzkUjedWLpjt5cT4jlzn_ZIyJWtWK8dfkTDElRdvzc_K0Xt2t_JwLpo8UaIg79BSmKUUw97REuq4Mek8TGpxKTNRDwVDogFDm5H5DcTFQCJaag8WFkc55v5oYSoJc3A6pR0hhGVbj7CxaWh5jlQuMSHeQ3F8JLLlziRhMtJjyW3I6gM94edwvyPrLzXr1rbr98fX76vNtZaTqStV2yE1vEQcubN2rpmuUFUbJTdsssxb6uu172zW1sVIMRvKhYXJYoJ5ZNPKCfDpop3mzRWtw_2avp-S2kH7pCE7_fxPcvR7jTnPBWatkvRg-HA0pPsyYi966vP8zCBjnrKXgou2kVO2Cvn8Z9pzyr44FuDoAJsWcEw7PCGd6X7k-VK6Plcs_cI-mLg |
Cites_doi | 10.1038/nature22976 10.1016/j.cell.2005.07.015 10.1038/s41587-020-0505-4 10.1093/bioinformatics/btad468 10.4049/jimmunol.181.9.6255 10.1103/PhysRevE.77.046105 10.1038/nbt.4314 10.1038/334395a0 10.1038/s41592-022-01578-0 10.1016/0377-0427(87)90125-7 10.1093/bioinformatics/btx286 10.1093/nar/gkac190 10.1038/s42003-021-02610-3 10.18653/v1/D17-1066 10.1038/s41586-021-03862-z 10.1038/s41587-020-00809-z 10.1038/s41598-023-43048-3 10.1093/bioinformatics/btv552 10.1038/s41590-023-01575-1 10.1090/S0002-9939-1956-0078686-7 10.1101/2024.05.20.594960 10.1126/science.286.5446.1913 10.1073/pnas.122653799 10.3389/fimmu.2019.02820 10.1073/pnas.2316401121 10.1007/978-3-031-78977-9_23 10.1093/nar/gkaa796 10.1038/ng.3822 10.1162/neco.1989.1.4.541 10.1016/j.immuno.2024.100045 10.3389/fimmu.2021.640725 10.7554/eLife.93934 10.1038/s41577-023-00835-3 10.1093/bioinformatics/btab446 10.1126/sciadv.abf5835 10.3389/fimmu.2021.664514 10.1158/1078-0432.CCR-19-3249 10.7554/eLife.68605 10.1084/jem.172.1.27 10.7554/eLife.81810 10.1016/S1074-7613(02)00288-1 10.1093/nar/gky1006 10.1038/s41467-021-21879-w 10.3389/fimmu.2022.1055151 10.1038/s41467-024-48198-0 10.1177/0272989X8900900307 10.1038/s41467-024-47461-8 |
ContentType | Journal Article |
Copyright | The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2025 |
Copyright_xml | – notice: The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. – notice: The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2025 |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM |
DOI | 10.1093/nargab/lqaf065 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE CrossRef MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2631-9268 |
ExternalDocumentID | PMC12107435 40432791 10_1093_nargab_lqaf065 |
Genre | Journal Article |
GrantInformation_xml | – fundername: ; grantid: U24CA248138 – fundername: ; grantid: 101007799 – fundername: ; grantid: 75N93019C00001 |
GroupedDBID | 0R~ 53G AAFWJ AAPXW AAVAP AAYXX ABEJV ABGNP ABPTD ABXVV AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS AMNDL BBNVY BENPR BHPHI CCPQU CITATION EBS EMOBN GROUPED_DOAJ HCIFZ IAO KSI M7P M~E PHGZM PHGZT PIMPY RPM TOX CGR CUY CVF ECM EIF IGS IHR INH ITC NPM PQGLB 7X8 PUEGO 5PM |
ID | FETCH-LOGICAL-c348t-78e1c9deef12d5946864d2c43b76ef17a95799d865cd32fc31f603f64d90dec3 |
ISSN | 2631-9268 |
IngestDate | Thu Aug 21 18:37:39 EDT 2025 Fri Sep 05 15:59:48 EDT 2025 Mon Jul 21 05:59:55 EDT 2025 Sun Jul 06 05:03:14 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Language | English |
License | https://creativecommons.org/licenses/by-nc/4.0 The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c348t-78e1c9deef12d5946864d2c43b76ef17a95799d865cd32fc31f603f64d90dec3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0001-7885-4311 0000-0003-0814-0289 |
OpenAccessLink | http://dx.doi.org/10.1093/nargab/lqaf065 |
PMID | 40432791 |
PQID | 3212783347 |
PQPubID | 23479 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_12107435 proquest_miscellaneous_3212783347 pubmed_primary_40432791 crossref_primary_10_1093_nargab_lqaf065 |
PublicationCentury | 2000 |
PublicationDate | 2025-06-01 |
PublicationDateYYYYMMDD | 2025-06-01 |
PublicationDate_xml | – month: 06 year: 2025 text: 2025-06-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | NAR genomics and bioinformatics |
PublicationTitleAlternate | NAR Genom Bioinform |
PublicationYear | 2025 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
References | Davis (2025052707095342800_B1) 1988; 334 Chen (2025052707095342800_B14) 2021; 49 Mayer-Blackwell (2025052707095342800_B17) 2021; 10 Raybould (2025052707095342800_B13) 2024; 43 Wang (2025052707095342800_B51) 2008; 77 Kruskal (2025052707095342800_B41) 1956; 7 Reinherz (2025052707095342800_B43) 1999; 286 Kobak (2025052707095342800_B55) 2021; 39 Zhang (2025052707095342800_B8) 2021; 7 Chronister (2025052707095342800_B18) 2021; 12 Vita (2025052707095342800_B10) 2019; 47 Montemurro (2025052707095342800_B19) 2022; 13 Zhang (2025052707095342800_B50) 2020; 26 Croce (2025052707095342800_B47) 2024; 15 LeCun (2025052707095342800_B36) 1989; 1 Higgins (2025052707095342800_B38) 2017 Danska (2025052707095342800_B42) 1990; 172 Francis (2025052707095342800_B31) 2022; 7 Pu (2025052707095342800_B35) Pedregosa (2025052707095342800_B40) Rousseeuw (2025052707095342800_B52) 1987; 20 Glanville (2025052707095342800_B3) 2017; 547 Montemurro (2025052707095342800_B20) 2021; 4 Springer (2025052707095342800_B56) 2021; 12 Heather (2025052707095342800_B27) 2022; 50 Leary (2025052707095342800_B46) 2024; 15 Reiser (2025052707095342800_B44) 2002; 16 Jensen (2025052707095342800_B21) 2024; 12 Burgess (2025052707095342800_B48) Gulrajani (2025052707095342800_B33) Sidhom (2025052707095342800_B22) 2021; 12 Gielis (2025052707095342800_B5) 2019; 10 Semeniuta (2025052707095342800_B34) Pavlopoulos (2025052707095342800_B53) Montemurro (2025052707095342800_B7) 2023; 13 Jones (2025052707095342800_B45) 2008; 181 Dumoulin (2025052707095342800_B37) Goncharov (2025052707095342800_B11) 2022; 19 Meynard-Piganeau (2025052707095342800_B23) 2024; 121 Garcia (2025052707095342800_B2) 2005; 122 Girvan (2025052707095342800_B26) 2002; 99 Valkiers (2025052707095342800_B15) 2021; 37 Khosla (2025052707095342800_B39) Tickotsky (2025052707095342800_B12) 2017; 33 Povlsen (2025052707095342800_B9) 2023; 12 Hudson (2025052707095342800_B6) 2023; 23 Becht (2025052707095342800_B54) 2019; 37 Myronov (2025052707095342800_B24) 2023; 39 Eberhardt (2025052707095342800_B29) 2021; 597 Garner (2025052707095342800_B30) 2023; 24 Kingma (2025052707095342800_B32) McClish (2025052707095342800_B49) 1989; 9 Emerson (2025052707095342800_B4) 2017; 49 Nielsen (2025052707095342800_B25) 2024; 16 Huang (2025052707095342800_B16) 2020; 38 Dunbar (2025052707095342800_B28) 2016; 32 |
References_xml | – volume: 547 start-page: 94 year: 2017 ident: 2025052707095342800_B3 article-title: Identifying specificity groups in the T cell receptor repertoire publication-title: Nature doi: 10.1038/nature22976 – volume: 122 start-page: 333 year: 2005 ident: 2025052707095342800_B2 article-title: How the T cell receptor sees antigen—a structural view publication-title: Cell doi: 10.1016/j.cell.2005.07.015 – volume: 38 start-page: 1194 year: 2020 ident: 2025052707095342800_B16 article-title: Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening publication-title: Nat Biotechnol doi: 10.1038/s41587-020-0505-4 – ident: 2025052707095342800_B39 article-title: Supervised contrastive learning – volume: 39 start-page: btad468 year: 2023 ident: 2025052707095342800_B24 article-title: BERTrand-peptide: TCR binding prediction using bidirectional encoder representations from transformers augmented with random TCR pairing publication-title: Bioinformatics doi: 10.1093/bioinformatics/btad468 – volume: 181 start-page: 6255 year: 2008 ident: 2025052707095342800_B45 article-title: Distinct CDR3 conformations in TCRs determine the level of cross-reactivity for diverse antigens, but not the docking orientation publication-title: J Immunol doi: 10.4049/jimmunol.181.9.6255 – volume: 77 start-page: 046105 year: 2008 ident: 2025052707095342800_B51 article-title: Betweenness centrality in a weighted network publication-title: Phys Rev E Stat Nonlin Soft Matter Phys doi: 10.1103/PhysRevE.77.046105 – volume: 7 start-page: eabk3070 year: 2022 ident: 2025052707095342800_B31 article-title: Allelic variation in class I HLA determines CD8+ T cell repertoire shape and cross-reactive memory responses to SARS-CoV-2 publication-title: Sci Immunol – volume: 37 start-page: 38 year: 2019 ident: 2025052707095342800_B54 article-title: Dimensionality reduction for visualizing single-cell data using UMAP publication-title: Nat Biotechnol doi: 10.1038/nbt.4314 – volume: 334 start-page: 395 year: 1988 ident: 2025052707095342800_B1 article-title: T-cell antigen receptor genes and T-cell recognition publication-title: Nature doi: 10.1038/334395a0 – volume: 19 start-page: 1017 year: 2022 ident: 2025052707095342800_B11 article-title: VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2 publication-title: Nat Methods doi: 10.1038/s41592-022-01578-0 – volume: 20 start-page: 53 year: 1987 ident: 2025052707095342800_B52 article-title: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis publication-title: J Comput Appl Math doi: 10.1016/0377-0427(87)90125-7 – volume: 33 start-page: 2924 year: 2017 ident: 2025052707095342800_B12 article-title: McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx286 – ident: 2025052707095342800_B37 article-title: A guide to convolution arithmetic for deep learning – volume: 50 start-page: e68 year: 2022 ident: 2025052707095342800_B27 article-title: Stitchr: stitching coding TCR nucleotide sequences from V/J/CDR3 information publication-title: Nucleic Acids Res doi: 10.1093/nar/gkac190 – volume: 4 start-page: 1060 year: 2021 ident: 2025052707095342800_B20 article-title: NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired tcrα and β sequence data publication-title: Commun Biol doi: 10.1038/s42003-021-02610-3 – volume-title: International Conference on Learning Representations year: 2017 ident: 2025052707095342800_B38 article-title: beta-VAE: learning basic visual concepts with a constrained variational framework – ident: 2025052707095342800_B34 article-title: A hybrid convolutional variational autoencoder for text generation doi: 10.18653/v1/D17-1066 – volume: 597 start-page: 279 year: 2021 ident: 2025052707095342800_B29 article-title: Functional HPV-specific PD-1+ stem-like CD8 T cells in head and neck cancer publication-title: Nature doi: 10.1038/s41586-021-03862-z – volume: 39 start-page: 156 year: 2021 ident: 2025052707095342800_B55 article-title: Initialization is critical for preserving global data structure in both t-SNE and UMAP publication-title: Nat Biotechnol doi: 10.1038/s41587-020-00809-z – volume: 13 start-page: 16147 year: 2023 ident: 2025052707095342800_B7 article-title: Benchmarking data-driven filtering for denoising of TCRpMHC single-cell data publication-title: Sci Rep doi: 10.1038/s41598-023-43048-3 – ident: 2025052707095342800_B33 article-title: PixelVAE: A latent variable model for natural images – ident: 2025052707095342800_B35 article-title: Variational autoencoder for deep learning of images, labels and captions – volume: 32 start-page: 298 year: 2016 ident: 2025052707095342800_B28 article-title: ANARCI: antigen receptor numbering and receptor classification publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv552 – ident: 2025052707095342800_B48 article-title: Understanding disentangling in β-VAE – volume: 24 start-page: 1565 year: 2023 ident: 2025052707095342800_B30 article-title: Single-cell analysis of human MAIT cell transcriptional, functional and clonal diversity publication-title: Nat Immunol doi: 10.1038/s41590-023-01575-1 – volume: 7 start-page: 48 year: 1956 ident: 2025052707095342800_B41 article-title: On the shortest spanning subtree of a graph and the traveling salesman problem publication-title: Proc Am Math Soc doi: 10.1090/S0002-9939-1956-0078686-7 – volume: 43 start-page: 114704 year: 2024 ident: 2025052707095342800_B13 article-title: The observed T cell receptor space database enables paired-chain repertoire mining, coherence analysis and language modelling publication-title: Cell Rep doi: 10.1101/2024.05.20.594960 – volume: 286 start-page: 1913 year: 1999 ident: 2025052707095342800_B43 article-title: The crystal structure of a T cell receptor in complex with peptide and MHC class II publication-title: Science doi: 10.1126/science.286.5446.1913 – volume: 99 start-page: 7821 year: 2002 ident: 2025052707095342800_B26 article-title: Community structure in social and biological networks publication-title: Proc Natl Acad Sci USA doi: 10.1073/pnas.122653799 – volume: 10 start-page: 2820 year: 2019 ident: 2025052707095342800_B5 article-title: Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires publication-title: Front Immunol doi: 10.3389/fimmu.2019.02820 – volume: 121 start-page: e2316401121 year: 2024 ident: 2025052707095342800_B23 article-title: TULIP: a transformer-based unsupervised language model for interacting peptides and T cell receptors that generalizes to unseen epitopes publication-title: Proc Natl Acad Sci USA doi: 10.1073/pnas.2316401121 – ident: 2025052707095342800_B32 article-title: Auto-encoding variational Bayes – ident: 2025052707095342800_B53 article-title: Revisiting silhouette aggregation doi: 10.1007/978-3-031-78977-9_23 – volume: 49 start-page: D468 year: 2021 ident: 2025052707095342800_B14 article-title: TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function publication-title: Nucleic Acids Res doi: 10.1093/nar/gkaa796 – volume: 49 start-page: 659 year: 2017 ident: 2025052707095342800_B4 article-title: Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire publication-title: Nat Genet doi: 10.1038/ng.3822 – volume: 1 start-page: 541 year: 1989 ident: 2025052707095342800_B36 article-title: Backpropagation applied to handwritten zip code recognition publication-title: Neural Comput doi: 10.1162/neco.1989.1.4.541 – volume: 16 start-page: 100045 year: 2024 ident: 2025052707095342800_B25 article-title: Lessons learned from the IMMREP23 TCR-epitope prediction challenge publication-title: ImmunoInformatics doi: 10.1016/j.immuno.2024.100045 – volume: 12 start-page: 640725 year: 2021 ident: 2025052707095342800_B18 article-title: TCRMatch: predicting T-cell receptor specificity based on sequence similarity to previously characterized receptors publication-title: Front Immunol doi: 10.3389/fimmu.2021.640725 – volume: 12 start-page: RP93934 year: 2024 ident: 2025052707095342800_B21 article-title: Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration publication-title: eLife doi: 10.7554/eLife.93934 – volume: 23 start-page: 511 year: 2023 ident: 2025052707095342800_B6 article-title: Can we predict T cell specificity with digital biology and machine learning? publication-title: Nat Rev Immunol doi: 10.1038/s41577-023-00835-3 – ident: 2025052707095342800_B40 article-title: Scikit-learn: machine learning in Python – volume: 37 start-page: 4865 year: 2021 ident: 2025052707095342800_B15 article-title: ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity publication-title: Bioinformatics doi: 10.1093/bioinformatics/btab446 – volume: 7 start-page: eabf5835 year: 2021 ident: 2025052707095342800_B8 article-title: A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity publication-title: Sci Adv doi: 10.1126/sciadv.abf5835 – volume: 12 start-page: 664514 year: 2021 ident: 2025052707095342800_B56 article-title: Contribution of T cell receptor alpha and beta CDR3, MHC typing, V and J genes to peptide binding prediction publication-title: Front Immunol doi: 10.3389/fimmu.2021.664514 – volume: 26 start-page: 1359 year: 2020 ident: 2025052707095342800_B50 article-title: Investigation of antigen-specific T-cell receptor clusters in human cancers publication-title: Clin Cancer Res doi: 10.1158/1078-0432.CCR-19-3249 – volume: 10 start-page: e68605 year: 2021 ident: 2025052707095342800_B17 article-title: TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs publication-title: eLife doi: 10.7554/eLife.68605 – volume: 172 start-page: 27 year: 1990 ident: 2025052707095342800_B42 article-title: The presumptive CDR3 regions of both T cell receptor alpha and beta chains determine T cell specificity for myoglobin peptides publication-title: J Exp Med doi: 10.1084/jem.172.1.27 – volume: 12 start-page: e81810 year: 2023 ident: 2025052707095342800_B9 article-title: Improved T cell receptor antigen pairing through data-driven filtering of sequencing information from single cells publication-title: eLife doi: 10.7554/eLife.81810 – volume: 16 start-page: 345 year: 2002 ident: 2025052707095342800_B44 article-title: A T cell receptor CDR3β loop undergoes conformational changes of unprecedented magnitude upon binding to a peptide/MHC class I complex publication-title: Immunity doi: 10.1016/S1074-7613(02)00288-1 – volume: 47 start-page: D339 year: 2019 ident: 2025052707095342800_B10 article-title: The Immune Epitope Database (IEDB): 2018 update publication-title: Nucleic Acids Res doi: 10.1093/nar/gky1006 – volume: 12 start-page: 1605 year: 2021 ident: 2025052707095342800_B22 article-title: DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires publication-title: Nat Commun doi: 10.1038/s41467-021-21879-w – volume: 13 start-page: 1055151 year: 2022 ident: 2025052707095342800_B19 article-title: NetTCR-2.1: lessons and guidance on how to develop models for TCR specificity predictions publication-title: Front Immunol doi: 10.3389/fimmu.2022.1055151 – volume: 15 start-page: 4271 year: 2024 ident: 2025052707095342800_B46 article-title: Designing meaningful continuous representations of T cell receptor sequences with deep generative models publication-title: Nat Commun doi: 10.1038/s41467-024-48198-0 – volume: 9 start-page: 190 year: 1989 ident: 2025052707095342800_B49 article-title: Analyzing a portion of the ROC curve publication-title: Med Decis Making doi: 10.1177/0272989X8900900307 – volume: 15 start-page: 3211 year: 2024 ident: 2025052707095342800_B47 article-title: Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells publication-title: Nat Commun doi: 10.1038/s41467-024-47461-8 |
SSID | ssj0002545401 |
Score | 2.293423 |
Snippet | T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging.... |
SourceID | pubmedcentral proquest pubmed crossref |
SourceType | Open Access Repository Aggregation Database Index Database |
StartPage | lqaf065 |
SubjectTerms | Algorithms Autoencoder Cluster Analysis Complementarity Determining Regions - genetics Humans Receptors, Antigen, T-Cell - genetics Receptors, Antigen, T-Cell, alpha-beta - genetics |
Title | TCRCluster: a novel approach to T-cell receptor latent featurization and clustering using contrastive learning-guided two-stage variational autoencoders |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40432791 https://www.proquest.com/docview/3212783347 https://pubmed.ncbi.nlm.nih.gov/PMC12107435 |
Volume | 7 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bi9QwFA7jLsi-iOJtXB0iCD4M2Z0mbdP6tg4ri-AgQ8X1qbRNOlsY23GnXcFHf4U_15OkaTu7K6gvpaQhhXxfT845PReEXoEGAEpo4MMnnnPiCjcngZenJITDJBfCcTOhXAMfFv7ZJ_f9uXc-Gv0cRC01dXqU_bg1r-R_UIUxwFVlyf4Dst2iMAD3gC9cAWG4_h3G8-V83ejeGjpnuayu5LorE67Uyogoz_wUxJrcgHU9XYNqqaoxSVXPs03BNKltZh3lOGjaRFzl9t3qyKK2tcSKrJpCgIZaf68IaJUrOb0CU9u6E5OmrlRZTNEG1Vudd3GyVI2aVfqzKQidFlVbr7UexNp_Nq7YL0lNom1S6JT_oqPdolCHuAk8VvHB5dBdQb0-rOpIarFGfeaQkJpmOlYG8wHV6ECerr8l-cw0k7gh600drFL1A07h5papsPebrxp7VUOI8tDpT70uFtE-uoP2Kef6V7_1-KjTHAxoUGqdrt4nOzZvPG7fd4Du2hV2VZsb9sr1sNuBHhPdR_daAwSfGDY9QCNZPkS_eia9wQnWPMKWR7iusOERtjzChkd4h0cYsMU9j7DmER7wCF_jEe54hAc8wkMePULRu9Nofkbanh0kY25QEx5IJwuFlLlDhRe6fuC7gmYuS7kPYzxRv4VDEfgeCAGaZ8zJ_RnLYVI4EzJjj9FeWZXyKcIhTzjYJ0kWgMkrHJmAMHEy3_XpzKNpxsbotd3teGMqs8QmooLFBqK4hWiMXlowYhCear-SUlbNNmaqv0HAmMvH6IkBp1vLojpGwQ5s3QRVmH33SVlc6ALtqigfaObesz8ueogO-o_jOdqrLxv5ArTbOp2g_beni4_LifYOTTQZfwMSObX1 |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TCRCluster%3A+a+novel+approach+to+T-cell+receptor+latent+featurization+and+clustering+using+contrastive+learning-guided+two-stage+variational+autoencoders&rft.jtitle=NAR+genomics+and+bioinformatics&rft.au=Wan%2C+Yat-Tsai+Richie&rft.au=Nielsen%2C+Morten&rft.date=2025-06-01&rft.eissn=2631-9268&rft.volume=7&rft.issue=2&rft.spage=lqaf065&rft_id=info:doi/10.1093%2Fnargab%2Flqaf065&rft_id=info%3Apmid%2F40432791&rft.externalDocID=40432791 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2631-9268&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2631-9268&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2631-9268&client=summon |