Heavy-Tailed Kernels Reveal a Finer Cluster Structure in t-SNE Visualisations

T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develo...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases Vol. 11906; pp. 124 - 139
Main Authors Kobak, Dmitry, Linderman, George, Steinerberger, Stefan, Kluger, Yuval, Berens, Philipp
Format Book Chapter Journal Article
LanguageEnglish
Published Cham Springer International Publishing 2020
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Abstract T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom $$\nu $$ , with $$\nu \rightarrow \infty $$ corresponding to SNE and $$\nu =1$$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $$\nu <1$$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
AbstractList T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom $$\nu $$ , with $$\nu \rightarrow \infty $$ corresponding to SNE and $$\nu =1$$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $$\nu <1$$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom , with → ∞ corresponding to SNE and = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
Author Steinerberger, Stefan
Kobak, Dmitry
Linderman, George
Kluger, Yuval
Berens, Philipp
Author_xml – sequence: 1
  givenname: Dmitry
  surname: Kobak
  fullname: Kobak, Dmitry
  email: dmitry.kobak@uni-tuebingen.de
– sequence: 2
  givenname: George
  surname: Linderman
  fullname: Linderman, George
– sequence: 3
  givenname: Stefan
  surname: Steinerberger
  fullname: Steinerberger, Stefan
– sequence: 4
  givenname: Yuval
  surname: Kluger
  fullname: Kluger, Yuval
– sequence: 5
  givenname: Philipp
  surname: Berens
  fullname: Berens, Philipp
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33103160$$D View this record in MEDLINE/PubMed
BookMark eNo1kNtOwzAQRA0U0Qv9AiTkHzCsL4ntR1S1FFFAohWvltNsUCBNqzip1L_HlPI0uzujleYMSa_e1kjIDYc7DqDvrTZMMpDAVMoTYMaZMzKU8XDck3My4CnnTEplL8g4xk-esqJHBnEWzGol-2QcwhcACGGt4vqK9KXkIHkKA_IyR78_sJUvK8zpMzY1VoG-4x59RT2dlTU2dFJ1oY26bJtu3XYN0rKmLVu-TulHGTpflcG35bYO1-Sy8FXA8UlHZDWbriZztnh7fJo8LNhOKdmyvDCZgcKKTAnBc5-nGtcZahSptzrJjNCJNmqdp14UXEHiEXIwiUHMitTLEbn9e7vrsg3mbteUG98c3H-tGOB_gRCt-hMbl22338FxcL9oXWTlpIuE3BGli2jlD951Zfk
ContentType Book Chapter
Journal Article
Copyright The Author(s) 2020, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright_xml – notice: The Author(s) 2020, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
DBID AAQKC
NPM
DOI 10.1007/978-3-030-46150-8_8
DatabaseName SpringerLink Fully Open Access Books
PubMed
DatabaseTitle PubMed
DatabaseTitleList
PubMed
Database_xml – sequence: 1
  dbid: AAQKC
  name: SpringerLink Fully Open Access Books
  url: https://link.springer.com
  sourceTypes: Publisher
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 3030461505
9783030461508
EISSN 1611-3349
Editor Maathuis, Marloes
Fromont, Elisa
Brefeld, Ulf
Knobbe, Arno
Hotho, Andreas
Robardet, Céline
Editor_xml – sequence: 1
  givenname: Ulf
  surname: Brefeld
  fullname: Brefeld, Ulf
  email: ulf.brefeld@leuphana.de
– sequence: 2
  givenname: Elisa
  orcidid: 0000-0003-0133-3491
  surname: Fromont
  fullname: Fromont, Elisa
  email: elisa.fromont@irisa.fr
– sequence: 3
  givenname: Andreas
  orcidid: 0000-0002-0483-5772
  surname: Hotho
  fullname: Hotho, Andreas
  email: hotho@informatik.uni-wuerzburg.de
– sequence: 4
  givenname: Arno
  orcidid: 0000-0002-0335-5099
  surname: Knobbe
  fullname: Knobbe, Arno
  email: a.j.knobbe@liacs.leidenuniv.nl
– sequence: 5
  givenname: Marloes
  orcidid: 0000-0002-3398-9893
  surname: Maathuis
  fullname: Maathuis, Marloes
  email: maathuis@stat.math.ethz.ch
– sequence: 6
  givenname: Céline
  orcidid: 0000-0002-8583-9408
  surname: Robardet
  fullname: Robardet, Céline
  email: Celine.Robardet@insa-lyon.fr
EndPage 139
ExternalDocumentID 33103160
Genre Journal Article
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: F30 HG010102
– fundername: NIGMS NIH HHS
  grantid: R01 GM131642
– fundername: NIGMS NIH HHS
  grantid: T32 GM007205
– fundername: NHGRI NIH HHS
  grantid: R01 HG008383
– fundername: NIMH NIH HHS
  grantid: U19 MH114830
GroupedDBID -DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AAQKC
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
ALMA_UNASSIGNED_HOLDINGS
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RIG
RNI
RSU
SVGTG
VI1
~02
NPM
ID FETCH-LOGICAL-p443t-df8b80f92b4221dad67ecbe7e26a975b8275784cd6a2f1405ae0d0858eebf6a3
IEDL.DBID AAQKC
ISBN 9783030461492
3030461491
ISSN 0302-9743
IngestDate Wed Feb 19 02:10:20 EST 2025
Tue Jul 29 20:08:28 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords t-SNE
dimensionality reduction
data visualisation
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-p443t-df8b80f92b4221dad67ecbe7e26a975b8275784cd6a2f1405ae0d0858eebf6a3
Notes Electronic supplementary materialThe online version of this chapter (10.1007/978-3-030-46150-8_8) contains supplementary material, which is available to authorized users.
Original Abstract: T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the ‘crowding problem’ of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}, with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \rightarrow \infty $$\end{document} corresponding to SNE and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =1$$\end{document} corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu <1$$\end{document} can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
The original version of this chapter was revised: The supplementary file and its link has been added. The correction to this chapter is available at 10.1007/978-3-030-46150-8_44
OpenAccessLink http://link.springer.com/10.1007/978-3-030-46150-8_8
PMID 33103160
PageCount 16
ParticipantIDs pubmed_primary_33103160
springer_books_10_1007_978_3_030_46150_8_8
PublicationCentury 2000
PublicationDate 2020
2020-00-00
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – year: 2020
  text: 2020
PublicationDecade 2020
PublicationPlace Cham
PublicationPlace_xml – name: Cham
– name: Germany
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part I
PublicationTitle Machine Learning and Knowledge Discovery in Databases
PublicationTitleAlternate Mach Learn Knowl Discov Databases
PublicationYear 2020
Publisher Springer International Publishing
Publisher_xml – name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Bertino, Elisa
Woeginger, Gerhard
Goos, Gerhard
Steffen, Bernhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Gerhard
  surname: Woeginger
  fullname: Woeginger, Gerhard
– sequence: 7
  givenname: Moti
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002299417
ssj0002792
Score 2.125013
Snippet T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional...
SourceID pubmed
springer
SourceType Index Database
Publisher
StartPage 124
SubjectTerms Data visualisation
Dimensionality reduction
t-SNE
Title Heavy-Tailed Kernels Reveal a Finer Cluster Structure in t-SNE Visualisations
URI http://link.springer.com/10.1007/978-3-030-46150-8_8
https://www.ncbi.nlm.nih.gov/pubmed/33103160
Volume 11906
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JasMwEBVNcik9dG_TJejQU0Fgy7JsH0NICA0JXdKSm7CsEQSKG7JB_74jO3ZL6aVHg4XhWdK80ei9IeRO2kQDBJLxLBJMZKFhWvg-yzg3xmYQ69iJk8cTOXwVD7Nw9i0KKy67VxXJYqOutG5lGR-nJBPOw5zFKm6QFo-SiDdJq9t9GvXqsxWOe6zw68TLcyZ5ZTWBM-TPrrdDUFqNi8QvXXjqZ15bE_35zR-B6lfltAhIgyNy4EQK1KkHEJhjsgf5CTmsWjTQ3Yo9JeMhpNtPNk1x8Rs6gmWOsZA-wxb5IU3pwEn_aO99sypGFV6ymyXQeU7X7GXSp2_zlVNd7q78nJHpoD_tDdmugwJbCBGsmbGItWcTrgXnvkmNjCDTEAGXaRKFOubOzl5kRqbcYqoVpuAZJGExgLYyDc5JM__I4ZJQYxM32mpP4r90WZqvkT1ZLk0MmS_b5KLERS1KlwwVuA5mvvTa5L4CSrnEYaUqp2SEVwUK4VUFvArhvfrPy9dkn7vstzgQuSFNBAlukSKsdWc3IzqkMXkcfwHbB60G
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8QwEB10PSge_P7-yMGTEGjSNNsel8Vldd0FdRVvoWkmIEgVd13w3ztpt4uIF4-FhsJrknmTyXsDcKF9ZhFjzWXRVlwVieNWCcELKZ3zBaY2DeLk4Uj3H9XNc_K8BEmjhaluuzclyWqnbsRudR2f5iRXwcScpyZdhhUlUp20YKXTuRt0F4crkjZZJRaZVxRc8upyguREoENzh7j2GleZqG14Fs9y4U305zd_RKpfpdMqIvU2YT2oFFiQDxAyW7CE5TZsND0a2HzJ7sCwj_nsi49zWv2ODfCjpGDI7nFGBJHlrBe0f6z7-jmpRlVmsp8fyF5KNuUPoyv29DIJssv5nZ9dGPeuxt0-n7dQ4O9KxVPuPIEd-UxaJaVwudNtLCy2Ueo8ayc2lcHPXhVO59JTrpXkGDliYSmi9TqP96BVvpV4AMz5LIz2NtL0M0OaJizRJy-1S7EQ-hD2a1zMe22TYeLQwkzo6BAuG6BMyBwmprFKJnhNbAheU8FrCN6j_7x8Dqv98fDW3F6PBsewJkMqXJ2OnECLAMNT4gtTezafHd8dLq9V
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA7agogH31qfOXgSQnezabp7LNVSrS0-qngLm80ECrKWvsB_72RfiHjxuLBh4dtJZr7MzDeEXEkbaYBAMp60BRNJyzAtfJ8lnBtjEwh16JqThyPZfxX3762ymnBeVruXKcm8p8GpNKWL5tTYMqvfzHP6aJ9MOEFzFqpwndSRm0Ro5PVO52nQrS5aOB64wq9YmOcU8_LUAmcYTLtBD0GuOy4iP5fkqZ55pVP05zd_eK1fadTMO_V2yJbrWKCulQBR2iVrkO6R7XJeAy227z4Z9iFefbFxjCeBoQOYpegY6TOsMFikMe25PkDa_VjOs1WZsOxyBnSS0gV7Gd3St8nctWAW9T8HZNy7HXf7rBinwKZCBAtmLALv2YhrwblvYiPbkGhoA5dx1G7pkDtte5EYGXOLvKsVg2cwIgsBtJVxcEhq6WcKx4QaG7nVVnsSf6yjbL7GUMpyaUJIfNkgRzkuappLZqjAjTPzpdcg1yVQyrGIuSplkxFeFSiEV2XwKoT35D8vX5KNx5ueergbDU7JJnesOLsoOSM1xAvOMXRY6IvCOL4BLnezmg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases&rft.au=Kobak%2C+Dmitry&rft.au=Linderman%2C+George&rft.au=Steinerberger%2C+Stefan&rft.au=Kluger%2C+Yuval&rft.atitle=Heavy-Tailed+Kernels+Reveal+a+Finer+Cluster+Structure+in+t-SNE+Visualisations&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2020-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030461492&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=124&rft.epage=139&rft_id=info:doi/10.1007%2F978-3-030-46150-8_8
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon