BAN-Storm: a Bandwidth-Aware Scheduling Mechanism for Stream Jobs
The essential component of the Big Data system is the processing frameworks and engines responsible for crunching the data. To cope with the growing computing demands of real-time Big Data applications, researchers have proposed several computing frameworks. The core of the computing frameworks i.e....
Saved in:
Published in | Journal of grid computing Vol. 19; no. 3 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Dordrecht
Springer Netherlands
01.09.2021
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The essential component of the Big Data system is the processing frameworks and engines responsible for crunching the data. To cope with the growing computing demands of real-time Big Data applications, researchers have proposed several computing frameworks. The core of the computing frameworks i.e., the scheduling mechanisms for real-time stream processing need to accommodate several important aspects such as incorporating resource awareness, heterogeneity of the computing resources, load balancing, etc. These aspects contribute significantly to the attained performance of the computing frameworks. Therefore, ignoring one of these aspects may lead to degraded performance. Most of the present stream processing frameworks do not consider the communication patterns and heterogeneity of the computing resources. This causes the highly communicating tasks mapped on different and costly remote nodes resulting in the increased communication overheads and latencies. In this work, we propose
BAN-Storm
, a stream scheduler that considers inter-task communication along the other important scheduling aspects such as heterogeneity, etc. to schedule stream jobs. The core objective of the proposed scheduler is to gain performance (i.e., higher throughput and reduced latency) using a resource-aware mapping mechanism. The proposed BAN-Storm schedules stream jobs considering Inter-task communication and machine’s computing power. The BAN-Storm employs a two-phase mapping mechanism i.e., in the first phase, the tasks are grouped so that the inter-group communication becomes low. In the second phase, for the resource-aware mapping, the computing power of each node is calculated using FLOPS, Memory (i.e., RAM), and Bandwidth followed by the task-group assignment to nodes (mapping on more capable nodes first). Apache Storm is used for the implementation of the proposed BAN-Storm scheduling mechanism. Experimental evaluation is done using the two real application topologies. The attained results are benchmarked using the three state-of-the-art stream schedulers. The thorough experimental results show up to 30% higher attained throughput as compared to the Apache Storm scheduler. Moreover, the attained results show that the proposed BAN-Storm provisions up to 33–66% fewer resources as compared to the default Storm. |
---|---|
ISSN: | 1570-7873 1572-9184 |
DOI: | 10.1007/s10723-021-09567-x |