QPOPSS: Query and Parallelism Optimized Space-Saving for finding frequent stream elements

The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient ϵ-approximate synopsis algorithms select all frequent elements but may overestimate them depending on ϵ (user-def...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 204; p. 105134
Main Authors Jarlow, Victor, Stylianopoulos, Charalampos, Papatriantafilou, Marina
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.10.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient ϵ-approximate synopsis algorithms select all frequent elements but may overestimate them depending on ϵ (user-defined parameter). Evolving applications demand performance only achievable by parallelization. However, algorithmic guarantees concerning concurrent updates and queries have been overlooked. We propose Query and Parallelism Optimized Space-Saving (QPOPSS ), providing concurrency guarantees. A cornerstone of the design is a new approach for the main data structure for the Space-Saving algorithm, enabling support of very fast queries. QPOPSS combines minimal overlap with concurrent updates, distributing work and using fine-grained thread synchronization to achieve high throughput, accuracy, and low memory use. Our analysis shows space and approximation bounds under various concurrency and data distribution conditions. Our empirical evaluation relative to representative state-of-the-art methods reveals that QPOPSS 's multithreaded throughput scales linearly while maintaining the highest accuracy, with orders of magnitude smaller memory footprint. •QPOPSS: a memory-efficient parallel algorithm for frequent element detection in data streams.•QOSS: a query-optimized variant of Space-Saving with improved top-k query performance.•Open-source implementation with comprehensive evaluation on real and synthetic datasets.•QPOPSS outperforms state-of-the-art in accuracy and memory efficiency under tight constraints.•Strong scalability and robustness on skewed data and large query workloads.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2025.105134