T-Storm: Traffic-Aware Online Scheduling in Storm

Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the International Conference on Distributed Computing Systems pp. 535 - 544
Main Authors	Jielong Xu, Zhenhua Chen, Jian Tang, Sen Su
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2014
Subjects	Big Data Data processing Fasteners Monitoring Resource Management Schedules Scheduling Storm Storms Stream Data Processing Topology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.
ISSN:	1063-6927
DOI:	10.1109/ICDCS.2014.61