QPOPSS: Query and Parallelism Optimized Space-Saving for finding frequent stream elements
The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient ϵ-approximate synopsis algorithms select all frequent elements but may overestimate them depending on ϵ (user-def...
Saved in:
Published in | Journal of parallel and distributed computing Vol. 204; p. 105134 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.10.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient ϵ-approximate synopsis algorithms select all frequent elements but may overestimate them depending on ϵ (user-defined parameter). Evolving applications demand performance only achievable by parallelization. However, algorithmic guarantees concerning concurrent updates and queries have been overlooked. We propose Query and Parallelism Optimized Space-Saving (QPOPSS ), providing concurrency guarantees. A cornerstone of the design is a new approach for the main data structure for the Space-Saving algorithm, enabling support of very fast queries. QPOPSS combines minimal overlap with concurrent updates, distributing work and using fine-grained thread synchronization to achieve high throughput, accuracy, and low memory use. Our analysis shows space and approximation bounds under various concurrency and data distribution conditions. Our empirical evaluation relative to representative state-of-the-art methods reveals that QPOPSS 's multithreaded throughput scales linearly while maintaining the highest accuracy, with orders of magnitude smaller memory footprint.
•QPOPSS: a memory-efficient parallel algorithm for frequent element detection in data streams.•QOSS: a query-optimized variant of Space-Saving with improved top-k query performance.•Open-source implementation with comprehensive evaluation on real and synthetic datasets.•QPOPSS outperforms state-of-the-art in accuracy and memory efficiency under tight constraints.•Strong scalability and robustness on skewed data and large query workloads. |
---|---|
ISSN: | 0743-7315 1096-0848 |
DOI: | 10.1016/j.jpdc.2025.105134 |