Parametric UMAP Embeddings for Representation and Semisupervised Learning

UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial co...

Full description

Saved in:
Bibliographic Details
Published inNeural computation Vol. 33; no. 11; pp. 2881 - 2907
Main Authors Sainburg, Tim, McInnes, Leland, Gentner, Timothy Q.
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 12.10.2021
MIT Press Journals, The
Subjects
Online AccessGet full text
ISSN0899-7667
1530-888X
1530-888X
DOI10.1162/neco_a_01434

Cover

Loading…
Abstract UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.
AbstractList UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.
UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1
UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1.UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1.
UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data. 1
Author Gentner, Timothy Q.
Sainburg, Tim
McInnes, Leland
AuthorAffiliation 1 University of California San Diego, La Jolla, CA 92093, U.S.A. timsainb@gmail.com
2 Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada leland.mcinnes@gmail.com
3 University of California San Diego, La Jolla, CA 92093, U.S.A. tgentner@ucsd.edu
AuthorAffiliation_xml – name: 2 Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada leland.mcinnes@gmail.com
– name: 1 University of California San Diego, La Jolla, CA 92093, U.S.A. timsainb@gmail.com
– name: 3 University of California San Diego, La Jolla, CA 92093, U.S.A. tgentner@ucsd.edu
Author_xml – sequence: 1
  givenname: Tim
  surname: Sainburg
  fullname: Sainburg, Tim
  email: timsainb@gmail.com
  organization: University of California San Diego, La Jolla, CA 92093, U.S.A. timsainb@gmail.com
– sequence: 2
  givenname: Leland
  surname: McInnes
  fullname: McInnes, Leland
  email: leland.mcinnes@gmail.com
  organization: Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada leland.mcinnes@gmail.com
– sequence: 3
  givenname: Timothy Q.
  surname: Gentner
  fullname: Gentner, Timothy Q.
  email: tgentner@ucsd.edu
  organization: University of California San Diego, La Jolla, CA 92093, U.S.A. tgentner@ucsd.edu
BookMark eNp1kV9rFDEUxYNU7Lb65gcY8KUPjt78mUzyIiylamHFohZ8C5nMnZoyk4zJ7IJ-elO2Ql0qBPKQ3zn3nNwTchRiQEJeUnhDqWRvA7porAEquHhCVrThUCulvh-RFSit61bK9pic5HwLAJJC84wccyHactoVubyyyU64JO-q60_rq-pi6rDvfbjJ1RBT9QXnhBnDYhcfQ2VDX33FyeftjGnnM_bVBm0KhX9Ong52zPji_j4l1-8vvp1_rDefP1yerze1a4RcagnIGkFbBw1o1jneylY57jQ4htbKwQoOqIQe7KCBi95ZIbtOKCFU2-uBn5J3e995203Yu5It2dHMyU82_TLRevPvS_A_zE3cGdVQKbQsBmf3Bin-3GJeTOnjcBxtwLjNhjVSl1QgVUFfHaC3cZtCqWeY0lIxyTUvFNtTLsWcEw7G-f1_lfl-NBTM3abMw00V0esD0d8G_8HXe3zyD0LcITvOPaWGg2i5MgwYLXID2vz286HH2SMej477A2PfucE
CitedBy_id crossref_primary_10_1109_TVCG_2022_3156760
crossref_primary_10_3389_fninf_2023_1086634
crossref_primary_10_3390_stresses4040051
crossref_primary_10_1002_cyto_a_24913
crossref_primary_10_1109_ACCESS_2024_3510526
crossref_primary_10_1021_acs_jcim_4c01902
crossref_primary_10_1088_2632_2153_ad9079
crossref_primary_10_1021_acs_jpca_2c03635
crossref_primary_10_1109_ACCESS_2024_3415088
crossref_primary_10_3390_app15063162
crossref_primary_10_1016_j_measurement_2024_114854
crossref_primary_10_17572_mj2023_1_4882
crossref_primary_10_1109_TVCG_2022_3223399
crossref_primary_10_1161_CIRCGEN_123_004200
crossref_primary_10_1021_acs_jpcc_4c08138
crossref_primary_10_1111_1365_2656_13754
crossref_primary_10_1088_1741_2552_adab93
crossref_primary_10_1109_ACCESS_2024_3361031
crossref_primary_10_1186_s13578_023_00991_y
crossref_primary_10_1088_2632_2153_ace81a
crossref_primary_10_1029_2022JB025933
crossref_primary_10_1109_TVCG_2022_3209423
crossref_primary_10_1038_s41467_024_48981_z
crossref_primary_10_2207_jjws_94_58
crossref_primary_10_1021_accountsmr_1c00089
crossref_primary_10_3390_ijms22168804
crossref_primary_10_1038_s41524_022_00784_w
crossref_primary_10_1038_s41597_022_01438_8
crossref_primary_10_1088_1741_2552_aca1e2
crossref_primary_10_1021_acs_jcim_3c00153
crossref_primary_10_1109_TPAMI_2022_3222104
crossref_primary_10_1021_acs_jctc_3c01412
crossref_primary_10_15324_kjcls_2024_56_1_10
crossref_primary_10_1021_acs_jpclett_4c02237
crossref_primary_10_1111_cgf_14834
crossref_primary_10_1109_TDEI_2023_3346853
crossref_primary_10_1186_s12859_023_05155_w
crossref_primary_10_3390_ph14080758
crossref_primary_10_1109_TVCG_2024_3456329
crossref_primary_10_3390_math10010029
crossref_primary_10_1021_acs_chemrestox_4c00169
crossref_primary_10_1007_s10489_023_04838_4
crossref_primary_10_1109_TPAMI_2023_3346212
crossref_primary_10_1016_j_anireprosci_2024_107619
crossref_primary_10_3389_fnbeh_2021_811737
crossref_primary_10_1021_acsinfecdis_4c00798
crossref_primary_10_1002_cyto_a_24565
crossref_primary_10_1109_TVCG_2023_3326515
crossref_primary_10_1523_JNEUROSCI_1503_22_2022
crossref_primary_10_1016_j_patcog_2022_108882
crossref_primary_10_1109_ACCESS_2024_3403991
crossref_primary_10_3847_2041_8213_acfa03
crossref_primary_10_1007_s11004_023_10079_5
crossref_primary_10_1101_gr_277066_122
crossref_primary_10_1021_jacs_4c08520
crossref_primary_10_3389_fmolb_2024_1483326
crossref_primary_10_1021_acs_jpcc_3c07398
crossref_primary_10_3389_fnins_2023_1221401
crossref_primary_10_1038_s41597_022_01269_7
crossref_primary_10_1038_s41467_024_49916_4
crossref_primary_10_1109_ACCESS_2025_3531712
ContentType Journal Article
Copyright Copyright MIT Press Journals, The 2021
2021 Massachusetts Institute of Technology.
2021 Massachusetts Institute of Technology 2021 Massachusetts Institute of Technology
Copyright_xml – notice: Copyright MIT Press Journals, The 2021
– notice: 2021 Massachusetts Institute of Technology.
– notice: 2021 Massachusetts Institute of Technology 2021 Massachusetts Institute of Technology
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
7X8
5PM
DOI 10.1162/neco_a_01434
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList CrossRef
Computer and Information Systems Abstracts
MEDLINE - Academic


DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1530-888X
EndPage 2907
ExternalDocumentID PMC8516496
10_1162_neco_a_01434
neco_a_01434.pdf
GroupedDBID ---
-~X
.4S
.DC
0R~
123
36B
4.4
6IK
AAJGR
AALMD
ABDBF
ABDNZ
ABIVO
ABJNI
ACGFO
AEGXH
AENEX
AFHIN
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
AZFZN
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EAP
EAS
EBC
EBD
EBS
ECS
EDO
EMB
EMK
EMOBN
EPL
EPS
EST
ESX
F5P
FEDTE
FNEHJ
HZ~
I-F
IPLJI
JAVBF
MCG
MINIK
MKJ
O9-
OCL
P2P
PK0
PQQKQ
RMI
SV3
TUS
WG8
WH7
XJE
ZWS
AAYXX
ABAZT
ABVLG
ACUHS
ADMLS
AEILP
AMVHM
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
7X8
5PM
ID FETCH-LOGICAL-c546t-60e25417c05092bc37678c3c90c2eaa6fa430e849faf9034dca46bb484487d9f3
ISSN 0899-7667
1530-888X
IngestDate Thu Aug 21 14:13:42 EDT 2025
Fri Jul 11 09:38:28 EDT 2025
Mon Jun 30 05:15:22 EDT 2025
Thu Apr 24 23:08:34 EDT 2025
Tue Jul 01 01:19:55 EDT 2025
Tue Mar 01 17:18:18 EST 2022
Thu Mar 28 07:29:35 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c546t-60e25417c05092bc37678c3c90c2eaa6fa430e849faf9034dca46bb484487d9f3
Notes November, 2021
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
OpenAccessLink https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01434
PMID 34474477
PQID 2896826393
PQPubID 37252
PageCount 27
ParticipantIDs proquest_miscellaneous_2569376068
crossref_citationtrail_10_1162_neco_a_01434
crossref_primary_10_1162_neco_a_01434
mit_journals_necov33i11_304738_2021_11_09_zip_neco_a_01434
proquest_journals_2896826393
pubmedcentral_primary_oai_pubmedcentral_nih_gov_8516496
mit_journals_10_1162_neco_a_01434
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-10-12
PublicationDateYYYYMMDD 2021-10-12
PublicationDate_xml – month: 10
  year: 2021
  text: 2021-10-12
  day: 12
PublicationDecade 2020
PublicationPlace One Rogers Street, Cambridge, MA 02142-1209, USA
PublicationPlace_xml – name: One Rogers Street, Cambridge, MA 02142-1209, USA
– name: Cambridge
– name: One Rogers Street, Cambridge, MA 02142-1209, USA journals-info@mit.edu
PublicationTitle Neural computation
PublicationYear 2021
Publisher MIT Press
MIT Press Journals, The
Publisher_xml – name: MIT Press
– name: MIT Press Journals, The
SSID ssj0006105
Score 2.6538854
Snippet UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional...
SourceID pubmedcentral
proquest
crossref
mit
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2881
SubjectTerms Algorithms
Embedding
Fuzzy sets
Graphical representations
Machine learning
Neural networks
Regularization
Semi-supervised learning
Structured data
Topology
Title Parametric UMAP Embeddings for Representation and Semisupervised Learning
URI https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01434
https://www.proquest.com/docview/2896826393
https://www.proquest.com/docview/2569376068
https://pubmed.ncbi.nlm.nih.gov/PMC8516496
Volume 33
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELfY9sIL41MUBgoSPFUpie26Nm8VDG1oQ4OuUt8i23WgEsuqLeVhfz13ifPhsUmDl6hK7CT1Xc73851_R8hby7jLlRWxpLmLuZA21joHqEKRW4WblLsqQfarOJjzL4vxoiurWO0uKc3IXt24r-R_pArnQK64S_YfJNveFE7Ab5AvHEHCcLyTjE80plYhx_5wfjw9Ge6fGbesgklV9uD3KsvVby6qk45nWN5ts0YDcQmupmdX_dF3UZGuoyINwXIPQZx-hrGgTZ0OBjLulvQOC0_4f-R8RZA6pwceXQQqMfw26i8z0CrPLe2vPAIwiyeirp0xco21TGKA0IueBUx7U2m96_9vIy0q0leA15nOkF-Qd5NRE4C_Nke1mYMVZhE06_feIjsUQAKY5Z3pp-OjWTsTC5_C2rx6s_FB0Pf9_oFLsnW2KgO0EebK9pyP04fkgUcN0bRWgUfknisek92mIkfkDfQTcthpRIQaEXUaEYFGRKFGRCCrKNSIqNGIp2T-ef_040Hsi2XEdsxFGYvEAdZPJxYJfaixyNIjLbMqsdRpLXLNWeIkV7nOVcL40moujOES8PlkqXL2jGwX54V7TiIjU0vH1qVOWg7-vWE6MUZRuUSuJukGZNiMV2Y9kzwWNPmV3SSdAXnXtl7XDCq3tHsDQ5_5z-vyljYfgjZ47TdjKwCxGD5mMkPVhX5ZorKr1fpa571Gqt0dqFQCEDZTDJ7fXoaBx7iZLtz5BtqMhcLcMSEHZBJoQ_uHkKQ9vFKsflZk7YBoBFfixR1H4SW53319e2S7vNi4V-D2lua11-4_RuqxLA
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parametric+UMAP+Embeddings+for+Representation+and+Semisupervised+Learning&rft.jtitle=Neural+computation&rft.au=Sainburg%2C+Tim&rft.au=McInnes%2C+Leland&rft.au=Gentner%2C+Timothy+Q.&rft.date=2021-10-12&rft.issn=0899-7667&rft.eissn=1530-888X&rft.spage=1&rft.epage=27&rft_id=info:doi/10.1162%2Fneco_a_01434&rft.externalDBID=n%2Fa&rft.externalDocID=10_1162_neco_a_01434
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0899-7667&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0899-7667&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0899-7667&client=summon