DenKv: Addressing Design Trade-offs of Key-value Stores for Scientific Applications

High-performance computing (HPC) facilities have employed flash-based storage tier near to compute nodes to absorb high I/O demand by HPC applications during periodic system-level checkpoints. To accelerate these checkpoints, proxy-based distributed key-value stores (PD-KVS) gained particular attent...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE/ACM International Parallel Data Systems Workshop (PDSW) pp. 20 - 25
Main Authors Jamil, Safdar, Khan, Awais, Kim, Kihyun, Lee, Jae-Kook, An, Dosik, Hong, Taeyoung, Oral, Sarp, Kim, Youngjae
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:High-performance computing (HPC) facilities have employed flash-based storage tier near to compute nodes to absorb high I/O demand by HPC applications during periodic system-level checkpoints. To accelerate these checkpoints, proxy-based distributed key-value stores (PD-KVS) gained particular attention for their flexibility to support multiple backends and different network configurations. PD-KVS rely internally on monolithic KVS, such as LevelDB or RocksDB, to exploit the KV interface and query support. However, PD-KVS are unaware of the high redundancy factor in checkpoint data, which can be up to GBs to TBs, and therefore, tend to generate high write and space amplification on these storage layers. In this paper, we propose DenKv which is deduplication-extended node-local LSM-tree-based KVS. DenKv employs asynchronous partially inline dedup (APID) and aims to maintain the performance characteristics of LSM-tree-based KVS while reducing the write and space amplification problems. We implemented DenKv atop BlobDB and showed that our proposed solution maintains performance while reducing write amplification up to 2× and space amplification by 4× on average.
DOI:10.1109/PDSW56643.2022.00009