SciSpace: A scientific collaboration workspace for geo-distributed HPC data centers

Future terabit networks are committed to dramatically improving big data motion between geographically dispersed HPC data centers. The scientific community takes advantage of the terabit networks such as DOE’s ESnet and accelerates the trend to build a small world of collaboration between geospatial...

Full description

Saved in:
Bibliographic Details
Published inFuture generation computer systems Vol. 101; pp. 398 - 409
Main Authors Khan, Awais, Kim, Taeuk, Byun, Hyunki, Kim, Youngjae
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.12.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Future terabit networks are committed to dramatically improving big data motion between geographically dispersed HPC data centers. The scientific community takes advantage of the terabit networks such as DOE’s ESnet and accelerates the trend to build a small world of collaboration between geospatial HPC data centers. It improves information and resource sharing for joint simulation and analysis between the HPC data centers. However, there exist several challenges for effective collaborations such as a collective view of multi-site shared data, minimal performance degradation of scientific applications running in a such collaboration environments and critical of all, data sharing policies in such collaborations. In this paper, we propose to build SciSpace, Scientific Collaboration Workspace for collaborative data centers. It provides a global view of information shared from multiple geo-distributed HPC data centers under a single workspace. SciSpace supports native data-access to gain high-performance when data read or write is required in native data center namespace. It is accomplished by integrating an on-demand metadata export protocol. To optimize scientific collaborations across HPC data centers, SciSpace implements search and discovery service. To evaluate, we configured two geo-distributed small-scale HPC data centers connected via high-speed Infiniband network such as terabits network of DOE’s ESnet, equipped with LustreFS. We show the feasibility of SciSpace using real scientific datasets and applications. The evaluation results show average 36% performance boost when the proposed native-data access is employed in collaborations. We also emulate a real climate science collaboration to validate the usefulness of SciSpace.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2019.06.006