Elastic Scaling for Data Stream Processing
This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitio...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 25; no. 6; pp. 1447 - 1463 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.06.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2013.295 |