Moving small files in a networked environment

Globally distributed computing infrastructures, such as clouds and supercomputers, are currently used to manage data that is generated with an unprecedented speed from a variety of resources. Coping with this trend, the volume of data exchanged across distant sites increases substantially. To accele...

Full description

Saved in:
Bibliographic Details
Published inFuture generation computer systems Vol. 139
Main Authors Jin, Chao, Abramson, David Andrew, Carroll, Jake, Liu, Zhengchun, Kettimuthu, Rajkumar
Format Journal Article
LanguageEnglish
Published United States Elsevier 23.09.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Globally distributed computing infrastructures, such as clouds and supercomputers, are currently used to manage data that is generated with an unprecedented speed from a variety of resources. Coping with this trend, the volume of data exchanged across distant sites increases substantially. To accelerate data transfer, high-speed networks are provided to connect remote sites. Most existing data movement solutions are optimized for moving large files. However, it is still challenging to transfer a large number of small files across networks. This disadvantage not only lowers data transfer performance, but also decreases overall system utilization. Here, we identify that moving small files is mainly constrained by degraded file system throughput, not just network performance as might be suspected. We have built a data transfer pipeline model to analyze the impact of small network I/O and storage I/O on data movement. Extending one of the widely used open source data movement solutions, GridFTP, we demonstrate several appropriate engineering approaches that mitigate the bottleneck and increase data transfer efficiency. We show optimizations that improve data transfer performance more than 5 times. In comparison to existing solutions, our approaches can save a significant amount of system resources for moving lots of small files.
Bibliography:AC02-06CH11357
USDOE
ISSN:0167-739X
1872-7115