MII: A Novel Content Defined Chunking Algorithm for Finding Incremental Data in Data Synchronization

In the data backup system, to reduce the bandwidth and processing time overhead caused by full backup technology during data synchronization between backups and source data, incremental backup technology is emerging as the focus of academic and industrial research. It is key but poorly-solved to fin...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 7; pp. 86932 - 86945
Main Authors Zhang, Changjian, Qi, Deyu, Cai, Zhe, Huang, Wenhao, Wang, Xinyang, Li, Wenlin, Guo, Jing
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the data backup system, to reduce the bandwidth and processing time overhead caused by full backup technology during data synchronization between backups and source data, incremental backup technology is emerging as the focus of academic and industrial research. It is key but poorly-solved to find the incremental data between backups and source data for incremental backup technology. To find out the incremental data during the backup process, here, in this paper, we propose a novel content-defined chunking algorithm. The source data and backup data are chunked into some small chunks in the same way with the variable length. Then, by comparing whether a chunk of source data is different from any of the chunks in backup data, we can evaluate whether the chunk of source data is incremental data. By experiments, the chunking algorithm in this paper is compared to other ones which are the classical or state-of-the-art algorithms. The experimental results show that the incremental data found by this algorithm can be reduced by 13%-34% compared to the others with the same chunk throughput.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2926195