StreamCloud: A Large Scale Data Streaming System

Data streaming has become an important paradigm for the real-time processing of continuous data flows in domains such as finance, telecommunications, networking, Some applications in these domains require to process massive data flows that current technology is unable to manage, that is, streams tha...

Full description

Saved in:
Bibliographic Details
Published in2010 IEEE 30th International Conference on Distributed Computing Systems pp. 126 - 137
Main Authors Gulisano, Vincenzo, Jimenez-Peris, Ricardo, Patino-Martinez, Marta, Valduriez, Patrick
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data streaming has become an important paradigm for the real-time processing of continuous data flows in domains such as finance, telecommunications, networking, Some applications in these domains require to process massive data flows that current technology is unable to manage, that is, streams that, even for a single query operator, require the capacity of potentially many machines. Research efforts on data streaming have mainly focused on scaling in the number of queries or query operators, but overlooked the scalability issue with respect to the stream volume. In this paper, we present StreamCloud a large scale data streaming system for processing large data stream volumes. We focus on how to parallelize continuous queries to obtain a highly scalable data streaming infrastructure. StreamCloud goes beyond the state of the art by using a novel parallelization technique that splits queries into subqueries that are allocated to independent sets of nodes in a way that minimizes the distribution overhead. StreamCloud is implemented as a middleware and is highly independent of the underlying data streaming engine. We explore and evaluate different strategies to parallelize data streaming and tackle with the main bottlenecks and overheads to achieve scalability. The paper presents the system design, implementation and a thorough evaluation of the scalability of the fully implemented system.
ISBN:142447261X
9781424472611
ISSN:1063-6927
DOI:10.1109/ICDCS.2010.72