BAN-Storm: a Bandwidth-Aware Scheduling Mechanism for Stream Jobs

The essential component of the Big Data system is the processing frameworks and engines responsible for crunching the data. To cope with the growing computing demands of real-time Big Data applications, researchers have proposed several computing frameworks. The core of the computing frameworks i.e....

Full description

Saved in:
Bibliographic Details
Published inJournal of grid computing Vol. 19; no. 3
Main Authors Muhammad, Asif, Aleem, Muhammad
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Netherlands 01.09.2021
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The essential component of the Big Data system is the processing frameworks and engines responsible for crunching the data. To cope with the growing computing demands of real-time Big Data applications, researchers have proposed several computing frameworks. The core of the computing frameworks i.e., the scheduling mechanisms for real-time stream processing need to accommodate several important aspects such as incorporating resource awareness, heterogeneity of the computing resources, load balancing, etc. These aspects contribute significantly to the attained performance of the computing frameworks. Therefore, ignoring one of these aspects may lead to degraded performance. Most of the present stream processing frameworks do not consider the communication patterns and heterogeneity of the computing resources. This causes the highly communicating tasks mapped on different and costly remote nodes resulting in the increased communication overheads and latencies. In this work, we propose BAN-Storm , a stream scheduler that considers inter-task communication along the other important scheduling aspects such as heterogeneity, etc. to schedule stream jobs. The core objective of the proposed scheduler is to gain performance (i.e., higher throughput and reduced latency) using a resource-aware mapping mechanism. The proposed BAN-Storm schedules stream jobs considering Inter-task communication and machine’s computing power. The BAN-Storm employs a two-phase mapping mechanism i.e., in the first phase, the tasks are grouped so that the inter-group communication becomes low. In the second phase, for the resource-aware mapping, the computing power of each node is calculated using FLOPS, Memory (i.e., RAM), and Bandwidth followed by the task-group assignment to nodes (mapping on more capable nodes first). Apache Storm is used for the implementation of the proposed BAN-Storm scheduling mechanism. Experimental evaluation is done using the two real application topologies. The attained results are benchmarked using the three state-of-the-art stream schedulers. The thorough experimental results show up to 30% higher attained throughput as compared to the Apache Storm scheduler. Moreover, the attained results show that the proposed BAN-Storm provisions up to 33–66% fewer resources as compared to the default Storm.
ISSN:1570-7873
1572-9184
DOI:10.1007/s10723-021-09567-x