Heavy-Tailed Kernels Reveal a Finer Cluster Structure in t-SNE Visualisations
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develo...
Saved in:
Published in | Machine Learning and Knowledge Discovery in Databases Vol. 11906; pp. 124 - 139 |
---|---|
Main Authors | , , , , |
Format | Book Chapter Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
2020
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom $$\nu $$ , with $$\nu \rightarrow \infty $$ corresponding to SNE and $$\nu =1$$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $$\nu <1$$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data. |
---|---|
AbstractList | T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom $$\nu $$ , with $$\nu \rightarrow \infty $$ corresponding to SNE and $$\nu =1$$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $$\nu <1$$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data. T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom , with → ∞ corresponding to SNE and = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data. |
Author | Steinerberger, Stefan Kobak, Dmitry Linderman, George Kluger, Yuval Berens, Philipp |
Author_xml | – sequence: 1 givenname: Dmitry surname: Kobak fullname: Kobak, Dmitry email: dmitry.kobak@uni-tuebingen.de – sequence: 2 givenname: George surname: Linderman fullname: Linderman, George – sequence: 3 givenname: Stefan surname: Steinerberger fullname: Steinerberger, Stefan – sequence: 4 givenname: Yuval surname: Kluger fullname: Kluger, Yuval – sequence: 5 givenname: Philipp surname: Berens fullname: Berens, Philipp |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33103160$$D View this record in MEDLINE/PubMed |
BookMark | eNo1kNtOwzAQRA0U0Qv9AiTkHzCsL4ntR1S1FFFAohWvltNsUCBNqzip1L_HlPI0uzujleYMSa_e1kjIDYc7DqDvrTZMMpDAVMoTYMaZMzKU8XDck3My4CnnTEplL8g4xk-esqJHBnEWzGol-2QcwhcACGGt4vqK9KXkIHkKA_IyR78_sJUvK8zpMzY1VoG-4x59RT2dlTU2dFJ1oY26bJtu3XYN0rKmLVu-TulHGTpflcG35bYO1-Sy8FXA8UlHZDWbriZztnh7fJo8LNhOKdmyvDCZgcKKTAnBc5-nGtcZahSptzrJjNCJNmqdp14UXEHiEXIwiUHMitTLEbn9e7vrsg3mbteUG98c3H-tGOB_gRCt-hMbl22338FxcL9oXWTlpIuE3BGli2jlD951Zfk |
ContentType | Book Chapter Journal Article |
Copyright | The Author(s) 2020, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. |
Copyright_xml | – notice: The Author(s) 2020, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. |
DBID | AAQKC NPM |
DOI | 10.1007/978-3-030-46150-8_8 |
DatabaseName | SpringerLink Fully Open Access Books PubMed |
DatabaseTitle | PubMed |
DatabaseTitleList | PubMed |
Database_xml | – sequence: 1 dbid: AAQKC name: SpringerLink Fully Open Access Books url: https://link.springer.com sourceTypes: Publisher – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 3030461505 9783030461508 |
EISSN | 1611-3349 |
Editor | Maathuis, Marloes Fromont, Elisa Brefeld, Ulf Knobbe, Arno Hotho, Andreas Robardet, Céline |
Editor_xml | – sequence: 1 givenname: Ulf surname: Brefeld fullname: Brefeld, Ulf email: ulf.brefeld@leuphana.de – sequence: 2 givenname: Elisa orcidid: 0000-0003-0133-3491 surname: Fromont fullname: Fromont, Elisa email: elisa.fromont@irisa.fr – sequence: 3 givenname: Andreas orcidid: 0000-0002-0483-5772 surname: Hotho fullname: Hotho, Andreas email: hotho@informatik.uni-wuerzburg.de – sequence: 4 givenname: Arno orcidid: 0000-0002-0335-5099 surname: Knobbe fullname: Knobbe, Arno email: a.j.knobbe@liacs.leidenuniv.nl – sequence: 5 givenname: Marloes orcidid: 0000-0002-3398-9893 surname: Maathuis fullname: Maathuis, Marloes email: maathuis@stat.math.ethz.ch – sequence: 6 givenname: Céline orcidid: 0000-0002-8583-9408 surname: Robardet fullname: Robardet, Céline email: Celine.Robardet@insa-lyon.fr |
EndPage | 139 |
ExternalDocumentID | 33103160 |
Genre | Journal Article |
GrantInformation_xml | – fundername: NHGRI NIH HHS grantid: F30 HG010102 – fundername: NIGMS NIH HHS grantid: R01 GM131642 – fundername: NIGMS NIH HHS grantid: T32 GM007205 – fundername: NHGRI NIH HHS grantid: R01 HG008383 – fundername: NIMH NIH HHS grantid: U19 MH114830 |
GroupedDBID | -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AAQKC AASHB ABMNI ACGFS ADCXD AEFIE ALMA_UNASSIGNED_HOLDINGS EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02 NPM |
ID | FETCH-LOGICAL-p443t-df8b80f92b4221dad67ecbe7e26a975b8275784cd6a2f1405ae0d0858eebf6a3 |
IEDL.DBID | AAQKC |
ISBN | 9783030461492 3030461491 |
ISSN | 0302-9743 |
IngestDate | Wed Feb 19 02:10:20 EST 2025 Tue Jul 29 20:08:28 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | t-SNE dimensionality reduction data visualisation |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-p443t-df8b80f92b4221dad67ecbe7e26a975b8275784cd6a2f1405ae0d0858eebf6a3 |
Notes | Electronic supplementary materialThe online version of this chapter (10.1007/978-3-030-46150-8_8) contains supplementary material, which is available to authorized users. Original Abstract: T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}, with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \rightarrow \infty $$\end{document} corresponding to SNE and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =1$$\end{document} corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu <1$$\end{document} can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data. The original version of this chapter was revised: The supplementary file and its link has been added. The correction to this chapter is available at 10.1007/978-3-030-46150-8_44 |
OpenAccessLink | http://link.springer.com/10.1007/978-3-030-46150-8_8 |
PMID | 33103160 |
PageCount | 16 |
ParticipantIDs | pubmed_primary_33103160 springer_books_10_1007_978_3_030_46150_8_8 |
PublicationCentury | 2000 |
PublicationDate | 2020 2020-00-00 |
PublicationDateYYYYMMDD | 2020-01-01 |
PublicationDate_xml | – year: 2020 text: 2020 |
PublicationDecade | 2020 |
PublicationPlace | Cham |
PublicationPlace_xml | – name: Cham – name: Germany |
PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part I |
PublicationTitle | Machine Learning and Knowledge Discovery in Databases |
PublicationTitleAlternate | Mach Learn Knowl Discov Databases |
PublicationYear | 2020 |
Publisher | Springer International Publishing |
Publisher_xml | – name: Springer International Publishing |
RelatedPersons | Hartmanis, Juris Gao, Wen Bertino, Elisa Woeginger, Gerhard Goos, Gerhard Steffen, Bernhard Yung, Moti |
RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Gerhard surname: Woeginger fullname: Woeginger, Gerhard – sequence: 7 givenname: Moti surname: Yung fullname: Yung, Moti |
SSID | ssj0002299417 ssj0002792 |
Score | 2.125013 |
Snippet | T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional... |
SourceID | pubmed springer |
SourceType | Index Database Publisher |
StartPage | 124 |
SubjectTerms | Data visualisation Dimensionality reduction t-SNE |
Title | Heavy-Tailed Kernels Reveal a Finer Cluster Structure in t-SNE Visualisations |
URI | http://link.springer.com/10.1007/978-3-030-46150-8_8 https://www.ncbi.nlm.nih.gov/pubmed/33103160 |
Volume | 11906 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JasMwEBVNcik9dG_TJejQU0Fgy7JsH0NICA0JXdKSm7CsEQSKG7JB_74jO3ZL6aVHg4XhWdK80ei9IeRO2kQDBJLxLBJMZKFhWvg-yzg3xmYQ69iJk8cTOXwVD7Nw9i0KKy67VxXJYqOutG5lGR-nJBPOw5zFKm6QFo-SiDdJq9t9GvXqsxWOe6zw68TLcyZ5ZTWBM-TPrrdDUFqNi8QvXXjqZ15bE_35zR-B6lfltAhIgyNy4EQK1KkHEJhjsgf5CTmsWjTQ3Yo9JeMhpNtPNk1x8Rs6gmWOsZA-wxb5IU3pwEn_aO99sypGFV6ymyXQeU7X7GXSp2_zlVNd7q78nJHpoD_tDdmugwJbCBGsmbGItWcTrgXnvkmNjCDTEAGXaRKFOubOzl5kRqbcYqoVpuAZJGExgLYyDc5JM__I4ZJQYxM32mpP4r90WZqvkT1ZLk0MmS_b5KLERS1KlwwVuA5mvvTa5L4CSrnEYaUqp2SEVwUK4VUFvArhvfrPy9dkn7vstzgQuSFNBAlukSKsdWc3IzqkMXkcfwHbB60G |
linkProvider | Springer Nature |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8QwEB10PSge_P7-yMGTEGjSNNsel8Vldd0FdRVvoWkmIEgVd13w3ztpt4uIF4-FhsJrknmTyXsDcKF9ZhFjzWXRVlwVieNWCcELKZ3zBaY2DeLk4Uj3H9XNc_K8BEmjhaluuzclyWqnbsRudR2f5iRXwcScpyZdhhUlUp20YKXTuRt0F4crkjZZJRaZVxRc8upyguREoENzh7j2GleZqG14Fs9y4U305zd_RKpfpdMqIvU2YT2oFFiQDxAyW7CE5TZsND0a2HzJ7sCwj_nsi49zWv2ODfCjpGDI7nFGBJHlrBe0f6z7-jmpRlVmsp8fyF5KNuUPoyv29DIJssv5nZ9dGPeuxt0-n7dQ4O9KxVPuPIEd-UxaJaVwudNtLCy2Ueo8ayc2lcHPXhVO59JTrpXkGDliYSmi9TqP96BVvpV4AMz5LIz2NtL0M0OaJizRJy-1S7EQ-hD2a1zMe22TYeLQwkzo6BAuG6BMyBwmprFKJnhNbAheU8FrCN6j_7x8Dqv98fDW3F6PBsewJkMqXJ2OnECLAMNT4gtTezafHd8dLq9V |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA7agogH31qfOXgSQnezabp7LNVSrS0-qngLm80ECrKWvsB_72RfiHjxuLBh4dtJZr7MzDeEXEkbaYBAMp60BRNJyzAtfJ8lnBtjEwh16JqThyPZfxX3762ymnBeVruXKcm8p8GpNKWL5tTYMqvfzHP6aJ9MOEFzFqpwndSRm0Ro5PVO52nQrS5aOB64wq9YmOcU8_LUAmcYTLtBD0GuOy4iP5fkqZ55pVP05zd_eK1fadTMO_V2yJbrWKCulQBR2iVrkO6R7XJeAy227z4Z9iFefbFxjCeBoQOYpegY6TOsMFikMe25PkDa_VjOs1WZsOxyBnSS0gV7Gd3St8nctWAW9T8HZNy7HXf7rBinwKZCBAtmLALv2YhrwblvYiPbkGhoA5dx1G7pkDtte5EYGXOLvKsVg2cwIgsBtJVxcEhq6WcKx4QaG7nVVnsSf6yjbL7GUMpyaUJIfNkgRzkuappLZqjAjTPzpdcg1yVQyrGIuSplkxFeFSiEV2XwKoT35D8vX5KNx5ueergbDU7JJnesOLsoOSM1xAvOMXRY6IvCOL4BLnezmg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases&rft.au=Kobak%2C+Dmitry&rft.au=Linderman%2C+George&rft.au=Steinerberger%2C+Stefan&rft.au=Kluger%2C+Yuval&rft.atitle=Heavy-Tailed+Kernels+Reveal+a+Finer+Cluster+Structure+in+t-SNE+Visualisations&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2020-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030461492&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=124&rft.epage=139&rft_id=info:doi/10.1007%2F978-3-030-46150-8_8 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon |