Using Global t-SNE to Preserve Intercluster Data Structure

The t-distributed stochastic neighbor embedding (t-SNE) method is one of the leading techniques for data visualization and clustering. This method finds lower-dimensional embedding of data points while minimizing distortions in distances between neighboring data points. By construction, t-SNE discar...

Full description

Saved in:
Bibliographic Details
Published inNeural computation Vol. 34; no. 8; pp. 1637 - 1651
Main Authors Zhou, Yuansheng, Sharpee, Tatyana O.
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 14.07.2022
MIT Press Journals, The
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The t-distributed stochastic neighbor embedding (t-SNE) method is one of the leading techniques for data visualization and clustering. This method finds lower-dimensional embedding of data points while minimizing distortions in distances between neighboring data points. By construction, t-SNE discards information about large-scale structure of the data. We show that adding a global cost function to the t-SNE cost function makes it possible to cluster the data while preserving global intercluster data structure. We test the new global t-SNE (g-SNE) method on one synthetic and two real data sets on flower shapes and human brain cells. We find that significant and meaningful global structure exists in both the plant and human brain data sets. In all cases, g-SNE outperforms t-SNE and UMAP in preserving the global structure. Topological analysis of the clustering result makes it possible to find an appropriate trade-off of data distribution across scales. We find differences in how data are distributed across scales between the two subjects that were part of the human brain data set. Thus, by striving to produce both accurate clustering and positioning between clusters, the g-SNE method can identify new aspects of data organization across scales.
Bibliography:August, 2022
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0899-7667
1530-888X
DOI:10.1162/neco_a_01504