FGFS: Feature Guided Frontier Scheduling for SIMT DAGs

In the past decade, heterogeneous multicore architectures with support for Single Instruction Multiple Thread (SIMT) style computing have become the standard platform of choice for scheduling HPC applications. Here, applications are typically modelled as a set of data-parallel tasks with dependencie...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 78; no. 9; pp. 11702 - 11743
Main Authors Ghose, Anirban, Dey, Soumyajit
Format Journal Article
LanguageEnglish
Published New York Springer US 01.06.2022
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the past decade, heterogeneous multicore architectures with support for Single Instruction Multiple Thread (SIMT) style computing have become the standard platform of choice for scheduling HPC applications. Here, applications are typically modelled as a set of data-parallel tasks with dependencies represented in the form of a directed acyclic graph (DAG). The relevant execution time information for each constituent task in the DAG is known beforehand and is leveraged by scheduling algorithms (List or Cluster based) to ascertain near-optimal schedules at runtime. However, given an online setting, where applications are submitted by multiple users and the types of applications are not restrictive, the chances of knowing execution time information for every program are highly unlikely. In this context, we propose a class of intelligent algorithms for heterogeneous CPU-GPU platforms that leverage static analysis-assisted machine learning techniques for deciding how device assignments should be made at runtime, thus bypassing the requirement for expensive offline profiling passes. We formalize relevant task-level ranking metrics and discuss how existing scheduling techniques can be adapted for our proposed class of algorithms. We also devise an online cluster scheduling algorithm that supports dynamic task arrival by determining in any given scheduling epoch, mapping decisions for a subset of tasks in a DAG. We perform a detailed comparative analysis between our proposed cluster and list scheduling heuristics via extensive simulation experiments using a variety of heterogeneous multicore platform configurations and observe performance speedups in the range of 1.1–1.5× for cluster scheduling over that of list scheduling.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-022-04323-8