Stochastic distributed data stream partitioning using task locality: design, implementation, and optimization

Distributed stream processing engines ( DSPEs ) provide stream partitioning methods for distributing messages to tasks deployed in the distributed environment for real-time stream processing. Among these methods, the original locality-aware stream partitioning ( LSP ) is a binary LSP that sends mess...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 77; no. 10; pp. 11353 - 11389
Main Authors Son, Siwoon, Im, Hyeonseung, Moon, Yang-Sae
Format Journal Article
LanguageEnglish
Published New York Springer US 2021
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Distributed stream processing engines ( DSPEs ) provide stream partitioning methods for distributing messages to tasks deployed in the distributed environment for real-time stream processing. Among these methods, the original locality-aware stream partitioning ( LSP ) is a binary LSP that sends messages only to downstreams on the same node as upstreams. The binary LSP degrades performance at general configurations because it focuses only on task locality and does not consider downstream status like distributed batch processing engines. In this paper, we propose a Stochastic LSP ( SLSP ) method that considers not only task locality but also downstream status by computing stream partitioning probability based on the round-trip time to downstreams. We also present coarse-grained and fine-grained methods for probing downstreams at node-level and process-level, respectively. Then, we optimize our SLSP using a weighted closeness to prioritize the partitioning probabilities and a parallel thread model to process each stage of the SLSP in parallel. Finally, we implement the SLSP in Apache Storm, a representative DSPE, and empirically evaluate it with the binary LSP. Experimental results show that our SLSP greatly reduces latency by up to 208% while maintaining a similar throughput compared to the binary LSP at general configurations. These results indicate that our SLSP performs the optimized stream partitioning by reflecting downstream status as well as task locality.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-021-03725-4