DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems

Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distrib...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 8; pp. 222900 - 222917
Main Authors Bordin, Maycon Viana, Griebler, Dalvan, Mencagli, Gabriele, Geyer, Claudio F. R., Fernandes, Luiz Gustavo L.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this article is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This article describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3043948