Elastic Scaling for Data Stream Processing

This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitio...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 25; no. 6; pp. 1447 - 1463
Main Authors	Gedik, Bugra, Schneider, Scott, Hirzel, Martin, Kun-Lung Wu
Format	Journal Article
Language	English
Published	New York IEEE 01.06.2014 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Availability Channels Data stream processing Data transmission Dynamical systems Dynamics elasticity Flow graphs Indexes Measurement Parallel processing parallelization Partitioning Profitability Run time (computers) Runtime Safety Throughput
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2013.295