Nebula: Distributed Edge Cloud for Data Intensive Computing

Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 28; no. 11; pp. 3229 - 3242
Main Authors Jonathan, Albert, Ryden, Mathew, Kwangsung Oh, Chandra, Abhishek, Weissman, Jon
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2017.2717883