Harmonizing Repair and Maintenance in LRC-Coded Storage
Modern storage systems not only introduce data redundancy for fault tolerance, but also conduct regular main- tenance operations on storage nodes for system robustness. Erasure coding provides storage-efficient redundancy and has been widely deployed in production, yet it also incurs substantial ban...
Saved in:
Published in | Proceedings - Symposium on Reliable Distributed Systems pp. 1 - 11 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
30.09.2024
|
Online Access | Get full text |
ISSN | 2575-8462 |
DOI | 10.1109/SRDS64841.2024.00012 |
Cover
Loading…
Summary: | Modern storage systems not only introduce data redundancy for fault tolerance, but also conduct regular main- tenance operations on storage nodes for system robustness. Erasure coding provides storage-efficient redundancy and has been widely deployed in production, yet it also incurs substantial bandwidth and I/O overhead due to the repair of storage failures. In particular, maintenance operations make storage nodes temporarily unavailable and lead to data unavailability, thereby incurring repair overhead for erasure-coded storage. In this paper, we study Locally Repairable Codes (LRCs), a class of practical repair-efficient erasure codes, and show that there exists an inherent performance trade-off between the repair and maintenance operations of LRCs in data center settings, such that the repair performance in regular (i.e., no-maintenance) and maintenance modes cannot be simultaneously optimized. To this end, we design a configurable data placement scheme that operates along the trade-off subject to fault-tolerance constraints. We prototype our data placement scheme atop Hadoop HDFS and show how it balances the performance trade-off of repair and maintenance operations in real network environments. |
---|---|
ISSN: | 2575-8462 |
DOI: | 10.1109/SRDS64841.2024.00012 |