Supporting Data Compression in PnetCDF
Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achievi...
Saved in:
Published in | 2021 IEEE International Conference on Big Data (Big Data) pp. 86 - 97 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
15.12.2021
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/BigData52589.2021.9671998 |
Cover
Abstract | Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achieving scalable performance on HPC systems. This paper presents the design and implementation of the variable compression feature in the Parallel NetCDF library. Our design employs the same concept of chunking used by the HDF5 library, but we focus on enabling I/O aggregation across multiple requests to address the challenges on performance and scalability. We evaluate our solution using the I/O kernel of real-world scientific applications and analyze the impacts of data compression on parallel I/O performance. Our result suggests that handling multiple requests at once can significantly improve the parallel I/O performance on chunked and compressed data. |
---|---|
AbstractList | Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O middlewares have incorporated compression features, few high-level parallel I/O libraries support data compression due to the challenges of achieving scalable performance on HPC systems. This paper presents the design and implementation of the variable compression feature in the Parallel NetCDF library. Our design employs the same concept of chunking used by the HDF5 library, but we focus on enabling I/O aggregation across multiple requests to address the challenges on performance and scalability. We evaluate our solution using the I/O kernel of real-world scientific applications and analyze the impacts of data compression on parallel I/O performance. Our result suggests that handling multiple requests at once can significantly improve the parallel I/O performance on chunked and compressed data. |
Author | Agrawal, Ankit Hou, Kaiyuan Kang, Qiao Lee, Sunwoo Choudhary, Alok Liao, Wei-keng |
Author_xml | – sequence: 1 givenname: Kaiyuan surname: Hou fullname: Hou, Kaiyuan email: khl7265@ece.northwestern.edu organization: Northwestern University – sequence: 2 givenname: Qiao surname: Kang fullname: Kang, Qiao email: qiao.kang@ece.northwestern.edu organization: Northwestern University – sequence: 3 givenname: Sunwoo surname: Lee fullname: Lee, Sunwoo email: slz839@ece.northwestern.edu organization: Northwestern University – sequence: 4 givenname: Ankit surname: Agrawal fullname: Agrawal, Ankit email: ankitag@ece.northwestern.edu organization: Northwestern University – sequence: 5 givenname: Alok surname: Choudhary fullname: Choudhary, Alok email: choudhar@ece.northwestern.edu organization: Northwestern University – sequence: 6 givenname: Wei-keng surname: Liao fullname: Liao, Wei-keng email: wkliao@ece.northwestern.edu organization: Northwestern University |
BookMark | eNotjrFOwzAQQI0EAy18AUtY2BJ8duzejZDSglQJJGCuzvGlskSdKAkDf48Qnd70nt5Cnec-i1K3oCsATfeP6bDmmZ1xSJXRBiryKyDCM7UA711tSRt3qe7ev4ehH-eUD8WfUDT9cRhlmlKfi5SLtyxzs95cqYuOvya5PnGpPjdPH81zuXvdvjQPuzIZbecSfbQtB67ZMSEG54IY3QqtuhbJioMIUWPnY7DGCMVagwsteAt1QGS7VDf_3SQi-2FMRx5_9qd1-wtngj7A |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/BigData52589.2021.9671998 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1665439025 9781665439022 |
EndPage | 97 |
ExternalDocumentID | 9671998 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Institute of Standards and Technology funderid: 10.13039/100000161 – fundername: National Energy Research Scientific Computing Center funderid: 10.13039/100017223 |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i203t-86d3caba4a5a988b55be20ce97fc893e51d1d08f6db322e9d4015bc16314b88a3 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:37:39 EDT 2023 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-86d3caba4a5a988b55be20ce97fc893e51d1d08f6db322e9d4015bc16314b88a3 |
PageCount | 12 |
ParticipantIDs | ieee_primary_9671998 |
PublicationCentury | 2000 |
PublicationDate | 2021-Dec.-15 |
PublicationDateYYYYMMDD | 2021-12-15 |
PublicationDate_xml | – month: 12 year: 2021 text: 2021-Dec.-15 day: 15 |
PublicationDecade | 2020 |
PublicationTitle | 2021 IEEE International Conference on Big Data (Big Data) |
PublicationTitleAbbrev | Big Data |
PublicationYear | 2021 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.7955291 |
Snippet | Recently, the dramatic increase of the data amounts drives up the demand for data compression among HPC applications. Although many file systems and I/O... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 86 |
SubjectTerms | Big Data Chunked Storage Layout Compression Data compression File systems I/O Aggregation Layout Libraries NetCDF Scalability Supercomputers |
Title | Supporting Data Compression in PnetCDF |
URI | https://ieeexplore.ieee.org/document/9671998 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB7aHsSTSiu-WUE82W2zSXazV1tLESo9WOit5DFbirCVsnvx15vZXSuKB28hhGTyYmaS75sBuGMs4ZKgOppj3BeOHppi410VZoROM5ZYRgTn2Us8XYjnpVy24GHPhUHECnyGIRWrv3y3tSU9lQ3SOCFKWBva_pjVXK0DuG3CZg4eN-uxLrSMpCIGSsTCpv2PxCmV3pgcwexrxBou8haWhQntx69gjP8V6Rh63wy9YL7XPSfQwrwL95Sic0thAdYByRjQZa9xrnmwyYN5jsVoPOnBYvL0Opr2mzwI_U005IVfPMetNlpoqVOljJQGo6HFNMmsNzdQMsfcUGWxM_56Yuq8zySN9ZYWE0YpzU-hk29zPIMgElokTmqVGSc4Zqn3YLhhmivfv9LqHLo0x9V7Hepi1Uzv4u_qSzikdSZ0B5NX0Cl2JV57HV2Ym2pzPgHt0pJZ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fS8MwED7mBPVJZRN_W0F8cnVpkjZ9dXNM3cYeNtjbSJrrGEIn0r3415tr60TxwbcQSHK5EC53-b47gBvGIi4JqqM5hi1hKdAUGueqMCN0nLIoYURwHo7C_lQ8z-SsBncbLgwiFuAz9KlZ_OXbVbKmUNl9HEZECduCbWf3hSzZWjtwXSXOvH9YLro61zKQijgoAfOrET9KpxSWo7cPw681S8DIq7_OjZ98_ErH-F-hDqD5zdHzxhvrcwg1zBpwS0U6V5QYYOGRjB5d9xLpmnnLzBtnmHe6vSZMe4-TTr9VVUJoLYM2z536LE-00UJLHStlpDQYtBOMozRxDw6UzDLbVmlojbugGFvnNUmTuLcWE0YpzY-gnq0yPAYvEFpEVmqVGis4prHzYbhhmis3v9LqBBq0x_lbmexiXm3v9O_uK9jtT4aD-eBp9HIGe6RzwnoweQ71_H2NF85i5-ayOKhPyryVpg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+IEEE+International+Conference+on+Big+Data+%28Big+Data%29&rft.atitle=Supporting+Data+Compression+in+PnetCDF&rft.au=Hou%2C+Kaiyuan&rft.au=Kang%2C+Qiao&rft.au=Lee%2C+Sunwoo&rft.au=Agrawal%2C+Ankit&rft.date=2021-12-15&rft.pub=IEEE&rft.spage=86&rft.epage=97&rft_id=info:doi/10.1109%2FBigData52589.2021.9671998&rft.externalDocID=9671998 |