MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems

Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliabl...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 51; pp. 2678 - 2682
Main Authors Zafar, Hamza, Khan, Farrukh Aftab, Carpenter, Bryan, Shafi, Aamir, Malik, Asad Waqar
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliable, and scalable open-source framework for processing large-scale data (Big Data). Realizing the importance and significance of Big Data, an increasing number of organizations are investing in relatively cheaper Hadoop clusters for executing their mission critical data processing applications. An issue here is that system administrators at these sites might have to maintain two parallel facilities for running HPC and Hadoop computations. This, of course, is not ideal due to redundant maintenance work and poor economics. This paper attempts to bridge this gap by allowing HPC and Hadoop jobs to co-exist on a single hardware facility. We achieve this goal by exploiting YARN—Hadoop v2.0—that de-couples the computational and resource scheduling part of the Hadoop framework from HDFS. In this context, we have developed a YARN-based reference runtime system for the MPJ Express software that allows executing parallel MPI-like Java applications on Hadoop clusters. The main contribution of this paper is provide Big Data community access to MPI-like programming using MPJ Express. As an aside, this work allows parallel Java applications to perform computations on data stored in Hadoop Distributed File System (HDFS).
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2015.05.379