Improving Collective I/O Performance Using Non-volatile Memory Devices

Collective I/O is a parallel I/O technique designed to deliver high performance data access to scientific applications running on high-end computing clusters. In collective I/O, write performance is highly dependent upon the storage system response time and limited by the slowest writer. The storage...

Full description

Saved in:
Bibliographic Details
Published in2016 IEEE International Conference on Cluster Computing (CLUSTER) pp. 120 - 129
Main Authors Congiu, Giuseppe, Narasimhamurthy, Sai, Suss, Tim, Brinkmann, Andre
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2016
Subjects
Online AccessGet full text
ISSN2168-9253
DOI10.1109/CLUSTER.2016.37

Cover

More Information
Summary:Collective I/O is a parallel I/O technique designed to deliver high performance data access to scientific applications running on high-end computing clusters. In collective I/O, write performance is highly dependent upon the storage system response time and limited by the slowest writer. The storage system response time in conjunction with the need for global synchronisation, required during every round of data exchange and write, severely impacts collective I/O performance. Future Exascale systems will have an increasing number of processor cores, while the number of storage servers will remain relatively small. Therefore, the storage system concurrency level will further increase, worsening the global synchronisation problem. Nowadays high performance computing nodes also have access to locally attached solid state drives, effectively providing an additional tier in the storage hierarchy. Unfortunately, this tier is not always fully integrated. In this paper we propose a set of MPI-IO hints extensions that enable users to take advantage of fast, locally attached storage devices to boost collective I/O performance by increasing parallelism and reducing global synchronisation impact in the ROMIO implementation. We demonstrate that by using local storage resources, collective write performance can be greatly improved compared to the case in which only the global parallel file system is used, but can also decrease if the ratio between aggregators and compute nodes is too small.
ISSN:2168-9253
DOI:10.1109/CLUSTER.2016.37