SideWalk: A Facility of Lightweight Out-of-Band Communications for Augmenting Distributed Data Processing Flows

The foundation of a data processing engine running on a large cluster is its programming model that defines data processing operations and data movements. A special kind of communication activities that are not normally defined in the programming model but are often used in ad hoc ways in system dev...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE International Conference on Cluster Computing pp. 246 - 249
Main Authors Yin Huai, Yuan Yuan, Rubao Lee, Xiaodong Zhang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The foundation of a data processing engine running on a large cluster is its programming model that defines data processing operations and data movements. A special kind of communication activities that are not normally defined in the programming model but are often used in ad hoc ways in system development, is called out-of-band communications. The existing ad hoc solutions of out-of-band communications are often hard to reuse, error-prone, and not free from unwanted side effects. To address these issues, we have designed and implemented a standalone facility of out-of-band communications called SideWalk. With this facility, users can add out-of-band communication operations into their distributed data flows through a set of reusable APIs. These APIs have well defined semantics and thus, users' chances of writing error-prone programs with SideWalk are minimized. To prevent users from introducing unwanted side effects while using SideWalk, we prototype SideWalk to efficiently handle lightweight out-of-band communications and we restrict communication patterns that can be conducted through SideWalk without affecting the applicability of SideWalk on typical use cases. Our experimental results show that execution times of distributed data processing flows in a Hadoop environment with out-of-band communications implemented with SideWalk are reduced up to 1.53 times compared with that of distributed data processing flows with out-of-band communications implemented with a representative ad hoc solution.
ISSN:1552-5244
2168-9253
DOI:10.1109/CLUSTER.2015.43