PADDi: Highly Scalable Parallel Algorithm for Discord Discovery on Multi-GPU Clusters

Currently, in a wide spectrum of subject domains, time series data mining requires the efficient subsequence anomaly discovery in a very long time series, which cannot be entirely placed in RAM. At present, one of the best approaches to solving such a problem is to formalize the anomaly as a discord...

Full description

Saved in:
Bibliographic Details
Published inLobachevskii journal of mathematics Vol. 46; no. 4; pp. 1480 - 1494
Main Authors Kraeva, Y. A., Zymbler, M. L.
Format Journal Article
LanguageEnglish
Published Moscow Pleiades Publishing 01.04.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Currently, in a wide spectrum of subject domains, time series data mining requires the efficient subsequence anomaly discovery in a very long time series, which cannot be entirely placed in RAM. At present, one of the best approaches to solving such a problem is to formalize the anomaly as a discord, a given-length subsequence that is maximally far away from its non-overlapping nearest neighbor. In the article, we introduce a novel parallel algorithm called PADDi (PALMAD-based anomaly discovery on distributed GPUs), which discovers arbitrary-length discords in a very long time series on a high-performance cluster with nodes, each of which is equipped with multiple GPUs. The algorithm exploits two-level parallelism: first, when the time series is divided into equal-length fragments stored on disks associated with the cluster nodes, and second, when a fragment is split into equal-length segments to be processed by GPUs of the respective node. To implement data exchanges between nodes and calculations on GPUs within a node, we employ MPI and CUDA technologies, respectively. The algorithm performs as follows. Firstly, in each segment processed by one GPU, the algorithm selects potential discords and then discards false positives, resulting in the local candidate set. Next, local candidate sets are sent among cluster nodes in an ‘‘all-to-all’’ manner, resulting in a global candidate set. Then, each cluster node refines the global candidates within its fragment, obtaining the local resulting set of true positive discords. Finally, each cluster node sends the local resulting sets to a master node, which outputs the end result as the intersection of the received local resulting sets. Extensive experiments over real-world and synthetic million-length time series on various configurations of two high-performance clusters with different models of GPU onboard (from 48 to 64 GPUs in total) showed that our algorithm’s scalability remains linear without stagnation or degradation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1995-0802
1818-9962
DOI:10.1134/S1995080225606198