Parametric UMAP Embeddings for Representation and Semisupervised Learning
UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial co...
Saved in:
Published in | Neural computation Vol. 33; no. 11; pp. 2881 - 2907 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
12.10.2021
MIT Press Journals, The |
Subjects | |
Online Access | Get full text |
ISSN | 0899-7667 1530-888X 1530-888X |
DOI | 10.1162/neco_a_01434 |
Cover
Loading…
Abstract | UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data. |
---|---|
AbstractList | UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data. UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1 UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1.UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1. UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data. 1 |
Author | Gentner, Timothy Q. Sainburg, Tim McInnes, Leland |
AuthorAffiliation | 1 University of California San Diego, La Jolla, CA 92093, U.S.A. timsainb@gmail.com 2 Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada leland.mcinnes@gmail.com 3 University of California San Diego, La Jolla, CA 92093, U.S.A. tgentner@ucsd.edu |
AuthorAffiliation_xml | – name: 2 Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada leland.mcinnes@gmail.com – name: 1 University of California San Diego, La Jolla, CA 92093, U.S.A. timsainb@gmail.com – name: 3 University of California San Diego, La Jolla, CA 92093, U.S.A. tgentner@ucsd.edu |
Author_xml | – sequence: 1 givenname: Tim surname: Sainburg fullname: Sainburg, Tim email: timsainb@gmail.com organization: University of California San Diego, La Jolla, CA 92093, U.S.A. timsainb@gmail.com – sequence: 2 givenname: Leland surname: McInnes fullname: McInnes, Leland email: leland.mcinnes@gmail.com organization: Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada leland.mcinnes@gmail.com – sequence: 3 givenname: Timothy Q. surname: Gentner fullname: Gentner, Timothy Q. email: tgentner@ucsd.edu organization: University of California San Diego, La Jolla, CA 92093, U.S.A. tgentner@ucsd.edu |
BookMark | eNp1kV9rFDEUxYNU7Lb65gcY8KUPjt78mUzyIiylamHFohZ8C5nMnZoyk4zJ7IJ-elO2Ql0qBPKQ3zn3nNwTchRiQEJeUnhDqWRvA7porAEquHhCVrThUCulvh-RFSit61bK9pic5HwLAJJC84wccyHactoVubyyyU64JO-q60_rq-pi6rDvfbjJ1RBT9QXnhBnDYhcfQ2VDX33FyeftjGnnM_bVBm0KhX9Ong52zPji_j4l1-8vvp1_rDefP1yerze1a4RcagnIGkFbBw1o1jneylY57jQ4htbKwQoOqIQe7KCBi95ZIbtOKCFU2-uBn5J3e995203Yu5It2dHMyU82_TLRevPvS_A_zE3cGdVQKbQsBmf3Bin-3GJeTOnjcBxtwLjNhjVSl1QgVUFfHaC3cZtCqWeY0lIxyTUvFNtTLsWcEw7G-f1_lfl-NBTM3abMw00V0esD0d8G_8HXe3zyD0LcITvOPaWGg2i5MgwYLXID2vz286HH2SMej477A2PfucE |
CitedBy_id | crossref_primary_10_1109_TVCG_2022_3156760 crossref_primary_10_3389_fninf_2023_1086634 crossref_primary_10_3390_stresses4040051 crossref_primary_10_1002_cyto_a_24913 crossref_primary_10_1109_ACCESS_2024_3510526 crossref_primary_10_1021_acs_jcim_4c01902 crossref_primary_10_1088_2632_2153_ad9079 crossref_primary_10_1021_acs_jpca_2c03635 crossref_primary_10_1109_ACCESS_2024_3415088 crossref_primary_10_3390_app15063162 crossref_primary_10_1016_j_measurement_2024_114854 crossref_primary_10_17572_mj2023_1_4882 crossref_primary_10_1109_TVCG_2022_3223399 crossref_primary_10_1161_CIRCGEN_123_004200 crossref_primary_10_1021_acs_jpcc_4c08138 crossref_primary_10_1111_1365_2656_13754 crossref_primary_10_1088_1741_2552_adab93 crossref_primary_10_1109_ACCESS_2024_3361031 crossref_primary_10_1186_s13578_023_00991_y crossref_primary_10_1088_2632_2153_ace81a crossref_primary_10_1029_2022JB025933 crossref_primary_10_1109_TVCG_2022_3209423 crossref_primary_10_1038_s41467_024_48981_z crossref_primary_10_2207_jjws_94_58 crossref_primary_10_1021_accountsmr_1c00089 crossref_primary_10_3390_ijms22168804 crossref_primary_10_1038_s41524_022_00784_w crossref_primary_10_1038_s41597_022_01438_8 crossref_primary_10_1088_1741_2552_aca1e2 crossref_primary_10_1021_acs_jcim_3c00153 crossref_primary_10_1109_TPAMI_2022_3222104 crossref_primary_10_1021_acs_jctc_3c01412 crossref_primary_10_15324_kjcls_2024_56_1_10 crossref_primary_10_1021_acs_jpclett_4c02237 crossref_primary_10_1111_cgf_14834 crossref_primary_10_1109_TDEI_2023_3346853 crossref_primary_10_1186_s12859_023_05155_w crossref_primary_10_3390_ph14080758 crossref_primary_10_1109_TVCG_2024_3456329 crossref_primary_10_3390_math10010029 crossref_primary_10_1021_acs_chemrestox_4c00169 crossref_primary_10_1007_s10489_023_04838_4 crossref_primary_10_1109_TPAMI_2023_3346212 crossref_primary_10_1016_j_anireprosci_2024_107619 crossref_primary_10_3389_fnbeh_2021_811737 crossref_primary_10_1021_acsinfecdis_4c00798 crossref_primary_10_1002_cyto_a_24565 crossref_primary_10_1109_TVCG_2023_3326515 crossref_primary_10_1523_JNEUROSCI_1503_22_2022 crossref_primary_10_1016_j_patcog_2022_108882 crossref_primary_10_1109_ACCESS_2024_3403991 crossref_primary_10_3847_2041_8213_acfa03 crossref_primary_10_1007_s11004_023_10079_5 crossref_primary_10_1101_gr_277066_122 crossref_primary_10_1021_jacs_4c08520 crossref_primary_10_3389_fmolb_2024_1483326 crossref_primary_10_1021_acs_jpcc_3c07398 crossref_primary_10_3389_fnins_2023_1221401 crossref_primary_10_1038_s41597_022_01269_7 crossref_primary_10_1038_s41467_024_49916_4 crossref_primary_10_1109_ACCESS_2025_3531712 |
ContentType | Journal Article |
Copyright | Copyright MIT Press Journals, The 2021 2021 Massachusetts Institute of Technology. 2021 Massachusetts Institute of Technology 2021 Massachusetts Institute of Technology |
Copyright_xml | – notice: Copyright MIT Press Journals, The 2021 – notice: 2021 Massachusetts Institute of Technology. – notice: 2021 Massachusetts Institute of Technology 2021 Massachusetts Institute of Technology |
DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D 7X8 5PM |
DOI | 10.1162/neco_a_01434 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | CrossRef Computer and Information Systems Abstracts MEDLINE - Academic |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1530-888X |
EndPage | 2907 |
ExternalDocumentID | PMC8516496 10_1162_neco_a_01434 neco_a_01434.pdf |
GroupedDBID | --- -~X .4S .DC 0R~ 123 36B 4.4 6IK AAJGR AALMD ABDBF ABDNZ ABIVO ABJNI ACGFO AEGXH AENEX AFHIN AIAGR ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF AZFZN BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EAP EAS EBC EBD EBS ECS EDO EMB EMK EMOBN EPL EPS EST ESX F5P FEDTE FNEHJ HZ~ I-F IPLJI JAVBF MCG MINIK MKJ O9- OCL P2P PK0 PQQKQ RMI SV3 TUS WG8 WH7 XJE ZWS AAYXX ABAZT ABVLG ACUHS ADMLS AEILP AMVHM CITATION 7SC 8FD JQ2 L7M L~C L~D 7X8 5PM |
ID | FETCH-LOGICAL-c546t-60e25417c05092bc37678c3c90c2eaa6fa430e849faf9034dca46bb484487d9f3 |
ISSN | 0899-7667 1530-888X |
IngestDate | Thu Aug 21 14:13:42 EDT 2025 Fri Jul 11 09:38:28 EDT 2025 Mon Jun 30 05:15:22 EDT 2025 Thu Apr 24 23:08:34 EDT 2025 Tue Jul 01 01:19:55 EDT 2025 Tue Mar 01 17:18:18 EST 2022 Thu Mar 28 07:29:35 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c546t-60e25417c05092bc37678c3c90c2eaa6fa430e849faf9034dca46bb484487d9f3 |
Notes | November, 2021 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
OpenAccessLink | https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01434 |
PMID | 34474477 |
PQID | 2896826393 |
PQPubID | 37252 |
PageCount | 27 |
ParticipantIDs | proquest_miscellaneous_2569376068 crossref_citationtrail_10_1162_neco_a_01434 crossref_primary_10_1162_neco_a_01434 mit_journals_necov33i11_304738_2021_11_09_zip_neco_a_01434 proquest_journals_2896826393 pubmedcentral_primary_oai_pubmedcentral_nih_gov_8516496 mit_journals_10_1162_neco_a_01434 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2021-10-12 |
PublicationDateYYYYMMDD | 2021-10-12 |
PublicationDate_xml | – month: 10 year: 2021 text: 2021-10-12 day: 12 |
PublicationDecade | 2020 |
PublicationPlace | One Rogers Street, Cambridge, MA 02142-1209, USA |
PublicationPlace_xml | – name: One Rogers Street, Cambridge, MA 02142-1209, USA – name: Cambridge – name: One Rogers Street, Cambridge, MA 02142-1209, USA journals-info@mit.edu |
PublicationTitle | Neural computation |
PublicationYear | 2021 |
Publisher | MIT Press MIT Press Journals, The |
Publisher_xml | – name: MIT Press – name: MIT Press Journals, The |
SSID | ssj0006105 |
Score | 2.6538854 |
Snippet | UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional... |
SourceID | pubmedcentral proquest crossref mit |
SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 2881 |
SubjectTerms | Algorithms Embedding Fuzzy sets Graphical representations Machine learning Neural networks Regularization Semi-supervised learning Structured data Topology |
Title | Parametric UMAP Embeddings for Representation and Semisupervised Learning |
URI | https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01434 https://www.proquest.com/docview/2896826393 https://www.proquest.com/docview/2569376068 https://pubmed.ncbi.nlm.nih.gov/PMC8516496 |
Volume | 33 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELfY9sIL41MUBgoSPFUpie26Nm8VDG1oQ4OuUt8i23WgEsuqLeVhfz13ifPhsUmDl6hK7CT1Xc73851_R8hby7jLlRWxpLmLuZA21joHqEKRW4WblLsqQfarOJjzL4vxoiurWO0uKc3IXt24r-R_pArnQK64S_YfJNveFE7Ab5AvHEHCcLyTjE80plYhx_5wfjw9Ge6fGbesgklV9uD3KsvVby6qk45nWN5ts0YDcQmupmdX_dF3UZGuoyINwXIPQZx-hrGgTZ0OBjLulvQOC0_4f-R8RZA6pwceXQQqMfw26i8z0CrPLe2vPAIwiyeirp0xco21TGKA0IueBUx7U2m96_9vIy0q0leA15nOkF-Qd5NRE4C_Nke1mYMVZhE06_feIjsUQAKY5Z3pp-OjWTsTC5_C2rx6s_FB0Pf9_oFLsnW2KgO0EebK9pyP04fkgUcN0bRWgUfknisek92mIkfkDfQTcthpRIQaEXUaEYFGRKFGRCCrKNSIqNGIp2T-ef_040Hsi2XEdsxFGYvEAdZPJxYJfaixyNIjLbMqsdRpLXLNWeIkV7nOVcL40moujOES8PlkqXL2jGwX54V7TiIjU0vH1qVOWg7-vWE6MUZRuUSuJukGZNiMV2Y9kzwWNPmV3SSdAXnXtl7XDCq3tHsDQ5_5z-vyljYfgjZ47TdjKwCxGD5mMkPVhX5ZorKr1fpa571Gqt0dqFQCEDZTDJ7fXoaBx7iZLtz5BtqMhcLcMSEHZBJoQ_uHkKQ9vFKsflZk7YBoBFfixR1H4SW53319e2S7vNi4V-D2lua11-4_RuqxLA |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parametric+UMAP+Embeddings+for+Representation+and+Semisupervised+Learning&rft.jtitle=Neural+computation&rft.au=Sainburg%2C+Tim&rft.au=McInnes%2C+Leland&rft.au=Gentner%2C+Timothy+Q.&rft.date=2021-10-12&rft.issn=0899-7667&rft.eissn=1530-888X&rft.spage=1&rft.epage=27&rft_id=info:doi/10.1162%2Fneco_a_01434&rft.externalDBID=n%2Fa&rft.externalDocID=10_1162_neco_a_01434 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0899-7667&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0899-7667&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0899-7667&client=summon |