A fast deduplication scheme for stored data in distributed storage systems

Data deduplication can effectively reduce data redundancy. However, its write performance is insufficient for existing storage systems, due to the additional calculation and I/O operations. In order to improve the deduplication speed in a distributed storage system, we propose FastDedup, a fast and...

Full description

Saved in:
Bibliographic Details
Main Authors Long, Yuhang, Fu, Yingxun
Format Conference Proceeding
LanguageEnglish
Published SPIE 31.05.2023
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data deduplication can effectively reduce data redundancy. However, its write performance is insufficient for existing storage systems, due to the additional calculation and I/O operations. In order to improve the deduplication speed in a distributed storage system, we propose FastDedup, a fast and effective deduplication scheme that focuses on the stored data. FastDedup improves deduplication speed through deduplication task distribution model and multi-container pool technology. Specifically, the deduplication task distribution model maintains the correctness for multiple deduplication nodes working simultaneously. The multi-container pool technology saves the operation time on the data merging stage. Evaluation results on three real backup datasets demonstrate that, compared to the unimproved technique, FastDedup increases deduplication throughput by 3.2% - 69.1%.
Bibliography:Conference Date: 2023-02-17|2023-02-19
Conference Location: Hangzhou, China
ISBN:9781510666290
151066629X
ISSN:0277-786X
DOI:10.1117/12.2680561