Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering

This paper presents new parallel algorithms for generating Euclidean minimum spanning trees and spatial clustering hierarchies (known as HDBSCAN$^*$). Our approach is based on generating a well-separated pair decomposition followed by using Kruskal's minimum spanning tree algorithm and bichroma...

Full description

Saved in:
Bibliographic Details
Main Authors Wang, Yiqiu, Yu, Shangdi, Gu, Yan, Shun, Julian
Format Journal Article
LanguageEnglish
Published 02.04.2021
Subjects
Online AccessGet full text
DOI10.48550/arxiv.2104.01126

Cover

Abstract This paper presents new parallel algorithms for generating Euclidean minimum spanning trees and spatial clustering hierarchies (known as HDBSCAN$^*$). Our approach is based on generating a well-separated pair decomposition followed by using Kruskal's minimum spanning tree algorithm and bichromatic closest pair computations. We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN$^*$. We also present a parallel approximate algorithm for OPTICS based on a recent sequential algorithm by Gan and Tao. Finally, we give a new parallel divide-and-conquer algorithm for computing the dendrogram and reachability plots, which are used in visualizing clusters of different scale that arise for both EMST and HDBSCAN$^*$. We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time). We implement our algorithms and propose a memory optimization that requires only a subset of well-separated pairs to be computed and materialized, leading to savings in both space (up to 10x) and time (up to 8x). Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13--55.89x, and existing parallel algorithms by at least an order of magnitude.
AbstractList This paper presents new parallel algorithms for generating Euclidean minimum spanning trees and spatial clustering hierarchies (known as HDBSCAN$^*$). Our approach is based on generating a well-separated pair decomposition followed by using Kruskal's minimum spanning tree algorithm and bichromatic closest pair computations. We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN$^*$. We also present a parallel approximate algorithm for OPTICS based on a recent sequential algorithm by Gan and Tao. Finally, we give a new parallel divide-and-conquer algorithm for computing the dendrogram and reachability plots, which are used in visualizing clusters of different scale that arise for both EMST and HDBSCAN$^*$. We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time). We implement our algorithms and propose a memory optimization that requires only a subset of well-separated pairs to be computed and materialized, leading to savings in both space (up to 10x) and time (up to 8x). Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13--55.89x, and existing parallel algorithms by at least an order of magnitude.
Author Wang, Yiqiu
Yu, Shangdi
Gu, Yan
Shun, Julian
Author_xml – sequence: 1
  givenname: Yiqiu
  surname: Wang
  fullname: Wang, Yiqiu
– sequence: 2
  givenname: Shangdi
  surname: Yu
  fullname: Yu, Shangdi
– sequence: 3
  givenname: Yan
  surname: Gu
  fullname: Gu, Yan
– sequence: 4
  givenname: Julian
  surname: Shun
  fullname: Shun, Julian
BackLink https://doi.org/10.48550/arXiv.2104.01126$$DView paper in arXiv
BookMark eNqFjrsOgkAQAK_QwtcHWLk_IAKCsTUEQmNioj1u4IBNjoPsHUb_XiD2VlPMFLMUM91qKcTWc53gHIbuAflNL8f33MBxPc8_LcQzQWPhhoxKSQUXVbVMtm4MlC1D3OeKCokarqSp6Ru4d6g16QoeLCWgLiAlych5TTmqUVsaGKneWMlDuBbzEpWRmx9XYpfEjyjdTy9Zx9Qgf7LxKZuejv-LL75_RBU
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2104.01126
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2104_01126
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2104_011263
IEDL.DBID GOX
IngestDate Wed Jul 23 00:23:38 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2104_011263
OpenAccessLink https://arxiv.org/abs/2104.01126
ParticipantIDs arxiv_primary_2104_01126
PublicationCentury 2000
PublicationDate 2021-04-02
PublicationDateYYYYMMDD 2021-04-02
PublicationDate_xml – month: 04
  year: 2021
  text: 2021-04-02
  day: 02
PublicationDecade 2020
PublicationYear 2021
Score 3.5108678
SecondaryResourceType preprint
Snippet This paper presents new parallel algorithms for generating Euclidean minimum spanning trees and spatial clustering hierarchies (known as HDBSCAN$^*$). Our...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Data Structures and Algorithms
Computer Science - Databases
Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Learning
Title Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering
URI https://arxiv.org/abs/2104.01126
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NT8MwDLW2nbggEKDx7QPXwtaGQI9oWqmQBkgMqbfithlU6srUD8TPx06H4LJrbCVWosjvKfEzwIXKGLJlotzK6NhRlGqHfK2cRLmZ6Ev5Skvt8OxRh6_qIbqOeoC_tTBUfedfnT5wUl8xH1GXI6ly6UPfdYVc3T9F3eOkleJa-__5Mca0Q_-SRLAD22t0h3fdcexCz5R78BZQ3eAzVdK2hI3F-ycz8o9ljQwYcdqmRZ4ZKnGWl_myXeLLqusjhPPKGGSmj2EuZcK2a0khZr6VBU6KVkQO2HEfzoPpfBI6NqZ41QlIxBJubMP1DmDANN8MATmPmwUvd0tJqsY--dloQWOdkef5mujmEIabZjnabDqGLVd-YchfE_cEBk3VmlNOo01yZvfyB6v-d84
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fast+Parallel+Algorithms+for+Euclidean+Minimum+Spanning+Tree+and+Hierarchical+Spatial+Clustering&rft.au=Wang%2C+Yiqiu&rft.au=Yu%2C+Shangdi&rft.au=Gu%2C+Yan&rft.au=Shun%2C+Julian&rft.date=2021-04-02&rft_id=info:doi/10.48550%2Farxiv.2104.01126&rft.externalDocID=2104_01126