XORInc: Optimizing Data Repair and Update for Erasure-Coded Systems with XOR-Based In-Network Computation

Erasure coding is widely used in the distributed storage systems due to its significant storage efficiency compared with replication at the same fault tolerance level. However, erasure coding introduces high cross-rack traffic since (1) repairing a single failed data block needs to read other availa...

Full description

Saved in:
Bibliographic Details
Published in2019 35th Symposium on Mass Storage Systems and Technologies (MSST) pp. 244 - 256
Main Authors Wang, Fang, Tang, Yingjie, Xie, Yanwen, Tang, Xuehai
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Erasure coding is widely used in the distributed storage systems due to its significant storage efficiency compared with replication at the same fault tolerance level. However, erasure coding introduces high cross-rack traffic since (1) repairing a single failed data block needs to read other available blocks from multiple nodes and (2) updating a data block triggers parity updates for all parity blocks. In order to alleviate the impact of these traffic on the performance of erasure coding, many works concentrate on designing new transmission schemes to increase bandwidth utilization among multiple storage nodes but they don't actually reduce network traffic. With the emergence of programmable network devices, the concept of in-network computation has been proposed. The key idea is to offload compute operations onto intermediate network devices. Inspired by this idea, we propose XORInc, a framework that utilizes programmable network devices to XOR data flows from multiple storage nodes so that XORInc can effectively reduce network traffic (especially the cross-rack traffic) and eliminate network bottleneck. Under XORInc, we design two new transmission schemes, NetRepair and NetUpdate, to optimize the repair and update operations, respectively. We implement XORInc based on HDFS-RAID and SDN to simulate an in-network computation framework. Experiments on a local testbed show that NetRepair reduces the repair time to almost the same as the normal read time and reduces the network traffic by up to 41%, meanwhile, NetUpdate reduces the update time and traffic by up to 74% and 30%, respectively.
ISSN:2160-1968
DOI:10.1109/MSST.2019.00005