Supporting Data Compression in PnetCDF

Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achievi...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE International Conference on Big Data (Big Data) pp. 86 - 97
Main Authors Hou, Kaiyuan, Kang, Qiao, Lee, Sunwoo, Agrawal, Ankit, Choudhary, Alok, Liao, Wei-keng
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.12.2021
Subjects
Online AccessGet full text
DOI10.1109/BigData52589.2021.9671998

Cover

Loading…
More Information
Summary:Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achieving scalable performance on HPC systems. This paper presents the design and implementation of the variable compression feature in the Parallel NetCDF library. Our design employs the same concept of chunking used by the HDF5 library, but we focus on enabling I/O aggregation across multiple requests to address the challenges on performance and scalability. We evaluate our solution using the I/O kernel of real-world scientific applications and analyze the impacts of data compression on parallel I/O performance. Our result suggests that handling multiple requests at once can significantly improve the parallel I/O performance on chunked and compressed data.
DOI:10.1109/BigData52589.2021.9671998