BTS: Load-Balanced Distributed Union-Find for Finding Connected Components with Balanced Tree Structures

How can we efficiently find connected components with Union-Find in a distributed system? Union-Find is the most efficient sequential algorithm for finding connected components with low memory usage and high speed. Several studies have adapted Union-Find to distributed memory systems to process larg...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE 40th International Conference on Data Engineering (ICDE) pp. 1090 - 1102
Main Authors Kim, Chaeeun, Han, Changhun, Park, Ha-Myung
Format Conference Proceeding
LanguageEnglish
Published IEEE 13.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:How can we efficiently find connected components with Union-Find in a distributed system? Union-Find is the most efficient sequential algorithm for finding connected components with low memory usage and high speed. Several studies have adapted Union-Find to distributed memory systems to process large graphs quickly; however, they all suffer from load balancing problems. We notice that the leading cause of the load balancing problems is the nature of Union-Find, which gathers more and more edges to a small number of vertices as it proceeds. In this paper, we propose BTS, a new fast and scalable distributed Union-Find algorithm for finding connected components in large graphs. BTS resolves the load balancing problems by proposing Balanced Union-Find, which allocates vertices to each processor and makes edges link to vertices in the same processor as much as possible. We further optimize BTS with edge refinement to minimize network traffic and memory usage. Experimental results show that BTS efficiently resolves the load balancing problems, processing 16-1024 times larger graphs with 3.1-261.9 times faster speeds than existing algorithms.
ISSN:2375-026X
DOI:10.1109/ICDE60146.2024.00089