TCRCluster: a novel approach to T-cell receptor latent featurization and clustering using contrastive learning-guided two-stage variational autoencoders

T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream ap...

Full description

Saved in:
Bibliographic Details
Published inNAR genomics and bioinformatics Vol. 7; no. 2; p. lqaf065
Main Authors Wan, Yat-Tsai Richie, Nielsen, Morten
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.06.2025
Subjects
Online AccessGet full text
ISSN2631-9268
2631-9268
DOI10.1093/nargab/lqaf065

Cover

Loading…
Abstract T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α–β chain data, incorporating all six complementarity-determining regions. A semi-supervised ‘two-stage VAE’ framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis.
AbstractList T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α-β chain data, incorporating all six complementarity-determining regions. A semi-supervised 'two-stage VAE' framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using -means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis.
T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α–β chain data, incorporating all six complementarity-determining regions. A semi-supervised ‘two-stage VAE’ framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K -means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis. Graphical Abstract
T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α–β chain data, incorporating all six complementarity-determining regions. A semi-supervised ‘two-stage VAE’ framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis.
T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α-β chain data, incorporating all six complementarity-determining regions. A semi-supervised 'two-stage VAE' framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis.T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging. Encoding T-cell receptor (TCR) sequences into informative feature spaces is therefore crucial for advancing specificity prediction and downstream applications. For this, we developed a variational autoencoder (VAE)-based model trained on paired TCR α-β chain data, incorporating all six complementarity-determining regions. A semi-supervised 'two-stage VAE' framework, integrating cosine triplet loss and a classifier, was found to further refine peptide-specific latent representations, outperforming sequence-based methods in specificity prediction. Clustering analyses leveraging our VAE latent space were evaluated using K-means, agglomerative clustering, and a novel graph-based method. Agglomerative clustering achieved the most biologically relevant results, balancing cluster purity and retention despite noise in TCR specificity annotations. We extended these insights to evaluate TCR repertoire data. Across datasets, VAE-based models outperformed sequence-based methods, particularly in retention metrics, with notable improvements in the SARS-CoV-2 repertoire dataset. Moreover, the cancer repertoire analysis highlighted the generalizability of our approach, where the model displayed high performance despite minimal similarity between the training and test data. Collectively, these results demonstrate the potential of VAE-based latent representations to offer a robust framework for prediction, clustering, and repertoire analysis.
Author Wan, Yat-Tsai Richie
Nielsen, Morten
Author_xml – sequence: 1
  givenname: Yat-Tsai Richie
  orcidid: 0000-0003-0814-0289
  surname: Wan
  fullname: Wan, Yat-Tsai Richie
– sequence: 2
  givenname: Morten
  orcidid: 0000-0001-7885-4311
  surname: Nielsen
  fullname: Nielsen, Morten
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40432791$$D View this record in MEDLINE/PubMed
BookMark eNpVkUtrGzEURkVJaNI02y6Llt1Mote8uinFpA8IFIL34lq6M1GRpYmkcWh_SX5ux7Ub0o0krg7nE_rekJMQAxLyjrMrznp5HSCNsLn2DzCwpn5FzkUjedWLpjt5cT4jlzn_ZIyJWtWK8dfkTDElRdvzc_K0Xt2t_JwLpo8UaIg79BSmKUUw97REuq4Mek8TGpxKTNRDwVDogFDm5H5DcTFQCJaag8WFkc55v5oYSoJc3A6pR0hhGVbj7CxaWh5jlQuMSHeQ3F8JLLlziRhMtJjyW3I6gM94edwvyPrLzXr1rbr98fX76vNtZaTqStV2yE1vEQcubN2rpmuUFUbJTdsssxb6uu172zW1sVIMRvKhYXJYoJ5ZNPKCfDpop3mzRWtw_2avp-S2kH7pCE7_fxPcvR7jTnPBWatkvRg-HA0pPsyYi966vP8zCBjnrKXgou2kVO2Cvn8Z9pzyr44FuDoAJsWcEw7PCGd6X7k-VK6Plcs_cI-mLg
Cites_doi 10.1038/nature22976
10.1016/j.cell.2005.07.015
10.1038/s41587-020-0505-4
10.1093/bioinformatics/btad468
10.4049/jimmunol.181.9.6255
10.1103/PhysRevE.77.046105
10.1038/nbt.4314
10.1038/334395a0
10.1038/s41592-022-01578-0
10.1016/0377-0427(87)90125-7
10.1093/bioinformatics/btx286
10.1093/nar/gkac190
10.1038/s42003-021-02610-3
10.18653/v1/D17-1066
10.1038/s41586-021-03862-z
10.1038/s41587-020-00809-z
10.1038/s41598-023-43048-3
10.1093/bioinformatics/btv552
10.1038/s41590-023-01575-1
10.1090/S0002-9939-1956-0078686-7
10.1101/2024.05.20.594960
10.1126/science.286.5446.1913
10.1073/pnas.122653799
10.3389/fimmu.2019.02820
10.1073/pnas.2316401121
10.1007/978-3-031-78977-9_23
10.1093/nar/gkaa796
10.1038/ng.3822
10.1162/neco.1989.1.4.541
10.1016/j.immuno.2024.100045
10.3389/fimmu.2021.640725
10.7554/eLife.93934
10.1038/s41577-023-00835-3
10.1093/bioinformatics/btab446
10.1126/sciadv.abf5835
10.3389/fimmu.2021.664514
10.1158/1078-0432.CCR-19-3249
10.7554/eLife.68605
10.1084/jem.172.1.27
10.7554/eLife.81810
10.1016/S1074-7613(02)00288-1
10.1093/nar/gky1006
10.1038/s41467-021-21879-w
10.3389/fimmu.2022.1055151
10.1038/s41467-024-48198-0
10.1177/0272989X8900900307
10.1038/s41467-024-47461-8
ContentType Journal Article
Copyright The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2025
Copyright_xml – notice: The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
– notice: The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2025
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOI 10.1093/nargab/lqaf065
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE

CrossRef
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
EISSN 2631-9268
ExternalDocumentID PMC12107435
40432791
10_1093_nargab_lqaf065
Genre Journal Article
GrantInformation_xml – fundername: ;
  grantid: U24CA248138
– fundername: ;
  grantid: 101007799
– fundername: ;
  grantid: 75N93019C00001
GroupedDBID 0R~
53G
AAFWJ
AAPXW
AAVAP
AAYXX
ABEJV
ABGNP
ABPTD
ABXVV
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
AMNDL
BBNVY
BENPR
BHPHI
CCPQU
CITATION
EBS
EMOBN
GROUPED_DOAJ
HCIFZ
IAO
KSI
M7P
M~E
PHGZM
PHGZT
PIMPY
RPM
TOX
CGR
CUY
CVF
ECM
EIF
IGS
IHR
INH
ITC
NPM
PQGLB
7X8
PUEGO
5PM
ID FETCH-LOGICAL-c348t-78e1c9deef12d5946864d2c43b76ef17a95799d865cd32fc31f603f64d90dec3
ISSN 2631-9268
IngestDate Thu Aug 21 18:37:39 EDT 2025
Fri Sep 05 15:59:48 EDT 2025
Mon Jul 21 05:59:55 EDT 2025
Sun Jul 06 05:03:14 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License https://creativecommons.org/licenses/by-nc/4.0
The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c348t-78e1c9deef12d5946864d2c43b76ef17a95799d865cd32fc31f603f64d90dec3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-7885-4311
0000-0003-0814-0289
OpenAccessLink http://dx.doi.org/10.1093/nargab/lqaf065
PMID 40432791
PQID 3212783347
PQPubID 23479
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_12107435
proquest_miscellaneous_3212783347
pubmed_primary_40432791
crossref_primary_10_1093_nargab_lqaf065
PublicationCentury 2000
PublicationDate 2025-06-01
PublicationDateYYYYMMDD 2025-06-01
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-06-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle NAR genomics and bioinformatics
PublicationTitleAlternate NAR Genom Bioinform
PublicationYear 2025
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Davis (2025052707095342800_B1) 1988; 334
Chen (2025052707095342800_B14) 2021; 49
Mayer-Blackwell (2025052707095342800_B17) 2021; 10
Raybould (2025052707095342800_B13) 2024; 43
Wang (2025052707095342800_B51) 2008; 77
Kruskal (2025052707095342800_B41) 1956; 7
Reinherz (2025052707095342800_B43) 1999; 286
Kobak (2025052707095342800_B55) 2021; 39
Zhang (2025052707095342800_B8) 2021; 7
Chronister (2025052707095342800_B18) 2021; 12
Vita (2025052707095342800_B10) 2019; 47
Montemurro (2025052707095342800_B19) 2022; 13
Zhang (2025052707095342800_B50) 2020; 26
Croce (2025052707095342800_B47) 2024; 15
LeCun (2025052707095342800_B36) 1989; 1
Higgins (2025052707095342800_B38) 2017
Danska (2025052707095342800_B42) 1990; 172
Francis (2025052707095342800_B31) 2022; 7
Pu (2025052707095342800_B35)
Pedregosa (2025052707095342800_B40)
Rousseeuw (2025052707095342800_B52) 1987; 20
Glanville (2025052707095342800_B3) 2017; 547
Montemurro (2025052707095342800_B20) 2021; 4
Springer (2025052707095342800_B56) 2021; 12
Heather (2025052707095342800_B27) 2022; 50
Leary (2025052707095342800_B46) 2024; 15
Reiser (2025052707095342800_B44) 2002; 16
Jensen (2025052707095342800_B21) 2024; 12
Burgess (2025052707095342800_B48)
Gulrajani (2025052707095342800_B33)
Sidhom (2025052707095342800_B22) 2021; 12
Gielis (2025052707095342800_B5) 2019; 10
Semeniuta (2025052707095342800_B34)
Pavlopoulos (2025052707095342800_B53)
Montemurro (2025052707095342800_B7) 2023; 13
Jones (2025052707095342800_B45) 2008; 181
Dumoulin (2025052707095342800_B37)
Goncharov (2025052707095342800_B11) 2022; 19
Meynard-Piganeau (2025052707095342800_B23) 2024; 121
Garcia (2025052707095342800_B2) 2005; 122
Girvan (2025052707095342800_B26) 2002; 99
Valkiers (2025052707095342800_B15) 2021; 37
Khosla (2025052707095342800_B39)
Tickotsky (2025052707095342800_B12) 2017; 33
Povlsen (2025052707095342800_B9) 2023; 12
Hudson (2025052707095342800_B6) 2023; 23
Becht (2025052707095342800_B54) 2019; 37
Myronov (2025052707095342800_B24) 2023; 39
Eberhardt (2025052707095342800_B29) 2021; 597
Garner (2025052707095342800_B30) 2023; 24
Kingma (2025052707095342800_B32)
McClish (2025052707095342800_B49) 1989; 9
Emerson (2025052707095342800_B4) 2017; 49
Nielsen (2025052707095342800_B25) 2024; 16
Huang (2025052707095342800_B16) 2020; 38
Dunbar (2025052707095342800_B28) 2016; 32
References_xml – volume: 547
  start-page: 94
  year: 2017
  ident: 2025052707095342800_B3
  article-title: Identifying specificity groups in the T cell receptor repertoire
  publication-title: Nature
  doi: 10.1038/nature22976
– volume: 122
  start-page: 333
  year: 2005
  ident: 2025052707095342800_B2
  article-title: How the T cell receptor sees antigen—a structural view
  publication-title: Cell
  doi: 10.1016/j.cell.2005.07.015
– volume: 38
  start-page: 1194
  year: 2020
  ident: 2025052707095342800_B16
  article-title: Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-020-0505-4
– ident: 2025052707095342800_B39
  article-title: Supervised contrastive learning
– volume: 39
  start-page: btad468
  year: 2023
  ident: 2025052707095342800_B24
  article-title: BERTrand-peptide: TCR binding prediction using bidirectional encoder representations from transformers augmented with random TCR pairing
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btad468
– volume: 181
  start-page: 6255
  year: 2008
  ident: 2025052707095342800_B45
  article-title: Distinct CDR3 conformations in TCRs determine the level of cross-reactivity for diverse antigens, but not the docking orientation
  publication-title: J Immunol
  doi: 10.4049/jimmunol.181.9.6255
– volume: 77
  start-page: 046105
  year: 2008
  ident: 2025052707095342800_B51
  article-title: Betweenness centrality in a weighted network
  publication-title: Phys Rev E Stat Nonlin Soft Matter Phys
  doi: 10.1103/PhysRevE.77.046105
– volume: 7
  start-page: eabk3070
  year: 2022
  ident: 2025052707095342800_B31
  article-title: Allelic variation in class I HLA determines CD8+ T cell repertoire shape and cross-reactive memory responses to SARS-CoV-2
  publication-title: Sci Immunol
– volume: 37
  start-page: 38
  year: 2019
  ident: 2025052707095342800_B54
  article-title: Dimensionality reduction for visualizing single-cell data using UMAP
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.4314
– volume: 334
  start-page: 395
  year: 1988
  ident: 2025052707095342800_B1
  article-title: T-cell antigen receptor genes and T-cell recognition
  publication-title: Nature
  doi: 10.1038/334395a0
– volume: 19
  start-page: 1017
  year: 2022
  ident: 2025052707095342800_B11
  article-title: VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2
  publication-title: Nat Methods
  doi: 10.1038/s41592-022-01578-0
– volume: 20
  start-page: 53
  year: 1987
  ident: 2025052707095342800_B52
  article-title: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
  publication-title: J Comput Appl Math
  doi: 10.1016/0377-0427(87)90125-7
– volume: 33
  start-page: 2924
  year: 2017
  ident: 2025052707095342800_B12
  article-title: McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx286
– ident: 2025052707095342800_B37
  article-title: A guide to convolution arithmetic for deep learning
– volume: 50
  start-page: e68
  year: 2022
  ident: 2025052707095342800_B27
  article-title: Stitchr: stitching coding TCR nucleotide sequences from V/J/CDR3 information
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkac190
– volume: 4
  start-page: 1060
  year: 2021
  ident: 2025052707095342800_B20
  article-title: NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired tcrα and β sequence data
  publication-title: Commun Biol
  doi: 10.1038/s42003-021-02610-3
– volume-title: International Conference on Learning Representations
  year: 2017
  ident: 2025052707095342800_B38
  article-title: beta-VAE: learning basic visual concepts with a constrained variational framework
– ident: 2025052707095342800_B34
  article-title: A hybrid convolutional variational autoencoder for text generation
  doi: 10.18653/v1/D17-1066
– volume: 597
  start-page: 279
  year: 2021
  ident: 2025052707095342800_B29
  article-title: Functional HPV-specific PD-1+ stem-like CD8 T cells in head and neck cancer
  publication-title: Nature
  doi: 10.1038/s41586-021-03862-z
– volume: 39
  start-page: 156
  year: 2021
  ident: 2025052707095342800_B55
  article-title: Initialization is critical for preserving global data structure in both t-SNE and UMAP
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-020-00809-z
– volume: 13
  start-page: 16147
  year: 2023
  ident: 2025052707095342800_B7
  article-title: Benchmarking data-driven filtering for denoising of TCRpMHC single-cell data
  publication-title: Sci Rep
  doi: 10.1038/s41598-023-43048-3
– ident: 2025052707095342800_B33
  article-title: PixelVAE: A latent variable model for natural images
– ident: 2025052707095342800_B35
  article-title: Variational autoencoder for deep learning of images, labels and captions
– volume: 32
  start-page: 298
  year: 2016
  ident: 2025052707095342800_B28
  article-title: ANARCI: antigen receptor numbering and receptor classification
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv552
– ident: 2025052707095342800_B48
  article-title: Understanding disentangling in β-VAE
– volume: 24
  start-page: 1565
  year: 2023
  ident: 2025052707095342800_B30
  article-title: Single-cell analysis of human MAIT cell transcriptional, functional and clonal diversity
  publication-title: Nat Immunol
  doi: 10.1038/s41590-023-01575-1
– volume: 7
  start-page: 48
  year: 1956
  ident: 2025052707095342800_B41
  article-title: On the shortest spanning subtree of a graph and the traveling salesman problem
  publication-title: Proc Am Math Soc
  doi: 10.1090/S0002-9939-1956-0078686-7
– volume: 43
  start-page: 114704
  year: 2024
  ident: 2025052707095342800_B13
  article-title: The observed T cell receptor space database enables paired-chain repertoire mining, coherence analysis and language modelling
  publication-title: Cell Rep
  doi: 10.1101/2024.05.20.594960
– volume: 286
  start-page: 1913
  year: 1999
  ident: 2025052707095342800_B43
  article-title: The crystal structure of a T cell receptor in complex with peptide and MHC class II
  publication-title: Science
  doi: 10.1126/science.286.5446.1913
– volume: 99
  start-page: 7821
  year: 2002
  ident: 2025052707095342800_B26
  article-title: Community structure in social and biological networks
  publication-title: Proc Natl Acad Sci USA
  doi: 10.1073/pnas.122653799
– volume: 10
  start-page: 2820
  year: 2019
  ident: 2025052707095342800_B5
  article-title: Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires
  publication-title: Front Immunol
  doi: 10.3389/fimmu.2019.02820
– volume: 121
  start-page: e2316401121
  year: 2024
  ident: 2025052707095342800_B23
  article-title: TULIP: a transformer-based unsupervised language model for interacting peptides and T cell receptors that generalizes to unseen epitopes
  publication-title: Proc Natl Acad Sci USA
  doi: 10.1073/pnas.2316401121
– ident: 2025052707095342800_B32
  article-title: Auto-encoding variational Bayes
– ident: 2025052707095342800_B53
  article-title: Revisiting silhouette aggregation
  doi: 10.1007/978-3-031-78977-9_23
– volume: 49
  start-page: D468
  year: 2021
  ident: 2025052707095342800_B14
  article-title: TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa796
– volume: 49
  start-page: 659
  year: 2017
  ident: 2025052707095342800_B4
  article-title: Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire
  publication-title: Nat Genet
  doi: 10.1038/ng.3822
– volume: 1
  start-page: 541
  year: 1989
  ident: 2025052707095342800_B36
  article-title: Backpropagation applied to handwritten zip code recognition
  publication-title: Neural Comput
  doi: 10.1162/neco.1989.1.4.541
– volume: 16
  start-page: 100045
  year: 2024
  ident: 2025052707095342800_B25
  article-title: Lessons learned from the IMMREP23 TCR-epitope prediction challenge
  publication-title: ImmunoInformatics
  doi: 10.1016/j.immuno.2024.100045
– volume: 12
  start-page: 640725
  year: 2021
  ident: 2025052707095342800_B18
  article-title: TCRMatch: predicting T-cell receptor specificity based on sequence similarity to previously characterized receptors
  publication-title: Front Immunol
  doi: 10.3389/fimmu.2021.640725
– volume: 12
  start-page: RP93934
  year: 2024
  ident: 2025052707095342800_B21
  article-title: Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration
  publication-title: eLife
  doi: 10.7554/eLife.93934
– volume: 23
  start-page: 511
  year: 2023
  ident: 2025052707095342800_B6
  article-title: Can we predict T cell specificity with digital biology and machine learning?
  publication-title: Nat Rev Immunol
  doi: 10.1038/s41577-023-00835-3
– ident: 2025052707095342800_B40
  article-title: Scikit-learn: machine learning in Python
– volume: 37
  start-page: 4865
  year: 2021
  ident: 2025052707095342800_B15
  article-title: ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btab446
– volume: 7
  start-page: eabf5835
  year: 2021
  ident: 2025052707095342800_B8
  article-title: A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity
  publication-title: Sci Adv
  doi: 10.1126/sciadv.abf5835
– volume: 12
  start-page: 664514
  year: 2021
  ident: 2025052707095342800_B56
  article-title: Contribution of T cell receptor alpha and beta CDR3, MHC typing, V and J genes to peptide binding prediction
  publication-title: Front Immunol
  doi: 10.3389/fimmu.2021.664514
– volume: 26
  start-page: 1359
  year: 2020
  ident: 2025052707095342800_B50
  article-title: Investigation of antigen-specific T-cell receptor clusters in human cancers
  publication-title: Clin Cancer Res
  doi: 10.1158/1078-0432.CCR-19-3249
– volume: 10
  start-page: e68605
  year: 2021
  ident: 2025052707095342800_B17
  article-title: TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs
  publication-title: eLife
  doi: 10.7554/eLife.68605
– volume: 172
  start-page: 27
  year: 1990
  ident: 2025052707095342800_B42
  article-title: The presumptive CDR3 regions of both T cell receptor alpha and beta chains determine T cell specificity for myoglobin peptides
  publication-title: J Exp Med
  doi: 10.1084/jem.172.1.27
– volume: 12
  start-page: e81810
  year: 2023
  ident: 2025052707095342800_B9
  article-title: Improved T cell receptor antigen pairing through data-driven filtering of sequencing information from single cells
  publication-title: eLife
  doi: 10.7554/eLife.81810
– volume: 16
  start-page: 345
  year: 2002
  ident: 2025052707095342800_B44
  article-title: A T cell receptor CDR3β loop undergoes conformational changes of unprecedented magnitude upon binding to a peptide/MHC class I complex
  publication-title: Immunity
  doi: 10.1016/S1074-7613(02)00288-1
– volume: 47
  start-page: D339
  year: 2019
  ident: 2025052707095342800_B10
  article-title: The Immune Epitope Database (IEDB): 2018 update
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky1006
– volume: 12
  start-page: 1605
  year: 2021
  ident: 2025052707095342800_B22
  article-title: DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires
  publication-title: Nat Commun
  doi: 10.1038/s41467-021-21879-w
– volume: 13
  start-page: 1055151
  year: 2022
  ident: 2025052707095342800_B19
  article-title: NetTCR-2.1: lessons and guidance on how to develop models for TCR specificity predictions
  publication-title: Front Immunol
  doi: 10.3389/fimmu.2022.1055151
– volume: 15
  start-page: 4271
  year: 2024
  ident: 2025052707095342800_B46
  article-title: Designing meaningful continuous representations of T cell receptor sequences with deep generative models
  publication-title: Nat Commun
  doi: 10.1038/s41467-024-48198-0
– volume: 9
  start-page: 190
  year: 1989
  ident: 2025052707095342800_B49
  article-title: Analyzing a portion of the ROC curve
  publication-title: Med Decis Making
  doi: 10.1177/0272989X8900900307
– volume: 15
  start-page: 3211
  year: 2024
  ident: 2025052707095342800_B47
  article-title: Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells
  publication-title: Nat Commun
  doi: 10.1038/s41467-024-47461-8
SSID ssj0002545401
Score 2.293423
Snippet T cells play a vital role in adaptive immunity by targeting pathogen-infected or cancerous cells, but predicting their specificity remains challenging....
SourceID pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
StartPage lqaf065
SubjectTerms Algorithms
Autoencoder
Cluster Analysis
Complementarity Determining Regions - genetics
Humans
Receptors, Antigen, T-Cell - genetics
Receptors, Antigen, T-Cell, alpha-beta - genetics
Title TCRCluster: a novel approach to T-cell receptor latent featurization and clustering using contrastive learning-guided two-stage variational autoencoders
URI https://www.ncbi.nlm.nih.gov/pubmed/40432791
https://www.proquest.com/docview/3212783347
https://pubmed.ncbi.nlm.nih.gov/PMC12107435
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bi9QwFA7jLsi-iOJtXB0iCD4M2Z0mbdP6tg4ri-AgQ8X1qbRNOlsY23GnXcFHf4U_15OkaTu7K6gvpaQhhXxfT845PReEXoEGAEpo4MMnnnPiCjcngZenJITDJBfCcTOhXAMfFv7ZJ_f9uXc-Gv0cRC01dXqU_bg1r-R_UIUxwFVlyf4Dst2iMAD3gC9cAWG4_h3G8-V83ejeGjpnuayu5LorE67Uyogoz_wUxJrcgHU9XYNqqaoxSVXPs03BNKltZh3lOGjaRFzl9t3qyKK2tcSKrJpCgIZaf68IaJUrOb0CU9u6E5OmrlRZTNEG1Vudd3GyVI2aVfqzKQidFlVbr7UexNp_Nq7YL0lNom1S6JT_oqPdolCHuAk8VvHB5dBdQb0-rOpIarFGfeaQkJpmOlYG8wHV6ECerr8l-cw0k7gh600drFL1A07h5papsPebrxp7VUOI8tDpT70uFtE-uoP2Kef6V7_1-KjTHAxoUGqdrt4nOzZvPG7fd4Du2hV2VZsb9sr1sNuBHhPdR_daAwSfGDY9QCNZPkS_eia9wQnWPMKWR7iusOERtjzChkd4h0cYsMU9j7DmER7wCF_jEe54hAc8wkMePULRu9Nofkbanh0kY25QEx5IJwuFlLlDhRe6fuC7gmYuS7kPYzxRv4VDEfgeCAGaZ8zJ_RnLYVI4EzJjj9FeWZXyKcIhTzjYJ0kWgMkrHJmAMHEy3_XpzKNpxsbotd3teGMqs8QmooLFBqK4hWiMXlowYhCear-SUlbNNmaqv0HAmMvH6IkBp1vLojpGwQ5s3QRVmH33SVlc6ALtqigfaObesz8ueogO-o_jOdqrLxv5ArTbOp2g_beni4_LifYOTTQZfwMSObX1
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TCRCluster%3A+a+novel+approach+to+T-cell+receptor+latent+featurization+and+clustering+using+contrastive+learning-guided+two-stage+variational+autoencoders&rft.jtitle=NAR+genomics+and+bioinformatics&rft.au=Wan%2C+Yat-Tsai+Richie&rft.au=Nielsen%2C+Morten&rft.date=2025-06-01&rft.eissn=2631-9268&rft.volume=7&rft.issue=2&rft.spage=lqaf065&rft_id=info:doi/10.1093%2Fnargab%2Flqaf065&rft_id=info%3Apmid%2F40432791&rft.externalDocID=40432791
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2631-9268&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2631-9268&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2631-9268&client=summon