Supporting Data Compression in PnetCDF

Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achievi...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE International Conference on Big Data (Big Data) pp. 86 - 97
Main Authors Hou, Kaiyuan, Kang, Qiao, Lee, Sunwoo, Agrawal, Ankit, Choudhary, Alok, Liao, Wei-keng
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.12.2021
Subjects
Online AccessGet full text
DOI10.1109/BigData52589.2021.9671998

Cover

Abstract Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achieving scalable performance on HPC systems. This paper presents the design and implementation of the variable compression feature in the Parallel NetCDF library. Our design employs the same concept of chunking used by the HDF5 library, but we focus on enabling I/O aggregation across multiple requests to address the challenges on performance and scalability. We evaluate our solution using the I/O kernel of real-world scientific applications and analyze the impacts of data compression on parallel I/O performance. Our result suggests that handling multiple requests at once can significantly improve the parallel I/O performance on chunked and compressed data.
AbstractList Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achieving scalable performance on HPC systems. This paper presents the design and implementation of the variable compression feature in the Parallel NetCDF library. Our design employs the same concept of chunking used by the HDF5 library, but we focus on enabling I/O aggregation across multiple requests to address the challenges on performance and scalability. We evaluate our solution using the I/O kernel of real-world scientific applications and analyze the impacts of data compression on parallel I/O performance. Our result suggests that handling multiple requests at once can significantly improve the parallel I/O performance on chunked and compressed data.
Author Agrawal, Ankit
Hou, Kaiyuan
Kang, Qiao
Lee, Sunwoo
Choudhary, Alok
Liao, Wei-keng
Author_xml – sequence: 1
  givenname: Kaiyuan
  surname: Hou
  fullname: Hou, Kaiyuan
  email: khl7265@ece.northwestern.edu
  organization: Northwestern University
– sequence: 2
  givenname: Qiao
  surname: Kang
  fullname: Kang, Qiao
  email: qiao.kang@ece.northwestern.edu
  organization: Northwestern University
– sequence: 3
  givenname: Sunwoo
  surname: Lee
  fullname: Lee, Sunwoo
  email: slz839@ece.northwestern.edu
  organization: Northwestern University
– sequence: 4
  givenname: Ankit
  surname: Agrawal
  fullname: Agrawal, Ankit
  email: ankitag@ece.northwestern.edu
  organization: Northwestern University
– sequence: 5
  givenname: Alok
  surname: Choudhary
  fullname: Choudhary, Alok
  email: choudhar@ece.northwestern.edu
  organization: Northwestern University
– sequence: 6
  givenname: Wei-keng
  surname: Liao
  fullname: Liao, Wei-keng
  email: wkliao@ece.northwestern.edu
  organization: Northwestern University
BookMark eNotjrFOwzAQQI0EAy18AUtY2BJ8duzejZDSglQJJGCuzvGlskSdKAkDf48Qnd70nt5Cnec-i1K3oCsATfeP6bDmmZ1xSJXRBiryKyDCM7UA711tSRt3qe7ev4ehH-eUD8WfUDT9cRhlmlKfi5SLtyxzs95cqYuOvya5PnGpPjdPH81zuXvdvjQPuzIZbecSfbQtB67ZMSEG54IY3QqtuhbJioMIUWPnY7DGCMVagwsteAt1QGS7VDf_3SQi-2FMRx5_9qd1-wtngj7A
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/BigData52589.2021.9671998
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665439025
9781665439022
EndPage 97
ExternalDocumentID 9671998
Genre orig-research
GrantInformation_xml – fundername: National Institute of Standards and Technology
  funderid: 10.13039/100000161
– fundername: National Energy Research Scientific Computing Center
  funderid: 10.13039/100017223
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i203t-86d3caba4a5a988b55be20ce97fc893e51d1d08f6db322e9d4015bc16314b88a3
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:39 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-86d3caba4a5a988b55be20ce97fc893e51d1d08f6db322e9d4015bc16314b88a3
PageCount 12
ParticipantIDs ieee_primary_9671998
PublicationCentury 2000
PublicationDate 2021-Dec.-15
PublicationDateYYYYMMDD 2021-12-15
PublicationDate_xml – month: 12
  year: 2021
  text: 2021-Dec.-15
  day: 15
PublicationDecade 2020
PublicationTitle 2021 IEEE International Conference on Big Data (Big Data)
PublicationTitleAbbrev Big Data
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7955291
Snippet Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O...
SourceID ieee
SourceType Publisher
StartPage 86
SubjectTerms Big Data
Chunked Storage Layout
Compression
Data compression
File systems
I/O Aggregation
Layout
Libraries
NetCDF
Scalability
Supercomputers
Title Supporting Data Compression in PnetCDF
URI https://ieeexplore.ieee.org/document/9671998
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB7aHsSTSiu-WUE82W2zSXazV1tLESo9WOit5DFbirCVsnvx15vZXSuKB28hhGTyYmaS75sBuGMs4ZKgOppj3BeOHppi410VZoROM5ZYRgTn2Us8XYjnpVy24GHPhUHECnyGIRWrv3y3tSU9lQ3SOCFKWBva_pjVXK0DuG3CZg4eN-uxLrSMpCIGSsTCpv2PxCmV3pgcwexrxBou8haWhQntx69gjP8V6Rh63wy9YL7XPSfQwrwL95Sic0thAdYByRjQZa9xrnmwyYN5jsVoPOnBYvL0Opr2mzwI_U005IVfPMetNlpoqVOljJQGo6HFNMmsNzdQMsfcUGWxM_56Yuq8zySN9ZYWE0YpzU-hk29zPIMgElokTmqVGSc4Zqn3YLhhmivfv9LqHLo0x9V7Hepi1Uzv4u_qSzikdSZ0B5NX0Cl2JV57HV2Ym2pzPgHt0pJZ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fS8MwED7mBPVJZRN_W0F8cnVpkjZ9dXNM3cYeNtjbSJrrGEIn0r3415tr60TxwbcQSHK5EC53-b47gBvGIi4JqqM5hi1hKdAUGueqMCN0nLIoYURwHo7C_lQ8z-SsBncbLgwiFuAz9KlZ_OXbVbKmUNl9HEZECduCbWf3hSzZWjtwXSXOvH9YLro61zKQijgoAfOrET9KpxSWo7cPw681S8DIq7_OjZ98_ErH-F-hDqD5zdHzxhvrcwg1zBpwS0U6V5QYYOGRjB5d9xLpmnnLzBtnmHe6vSZMe4-TTr9VVUJoLYM2z536LE-00UJLHStlpDQYtBOMozRxDw6UzDLbVmlojbugGFvnNUmTuLcWE0YpzY-gnq0yPAYvEFpEVmqVGis4prHzYbhhmis3v9LqBBq0x_lbmexiXm3v9O_uK9jtT4aD-eBp9HIGe6RzwnoweQ71_H2NF85i5-ayOKhPyryVpg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+IEEE+International+Conference+on+Big+Data+%28Big+Data%29&rft.atitle=Supporting+Data+Compression+in+PnetCDF&rft.au=Hou%2C+Kaiyuan&rft.au=Kang%2C+Qiao&rft.au=Lee%2C+Sunwoo&rft.au=Agrawal%2C+Ankit&rft.date=2021-12-15&rft.pub=IEEE&rft.spage=86&rft.epage=97&rft_id=info:doi/10.1109%2FBigData52589.2021.9671998&rft.externalDocID=9671998