Multi-agent Algorithm for Re-allocating Grid-resources and Improving Fault-tolerance of Problem-solving Processes

Nowadays, a provision of the computational process fault-tolerance in Grid is a relevant issue. In the paper, we address a fault-tolerance improvement in solving large-scale scientific and applied problems that are implemented through modular programming in heterogeneous distributed computing enviro...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 150; pp. 171 - 178
Main Authors	Feoktistov, Alexander, Kostromin, Roman, Sidorov, Ivan, Gorsky, Sergey, Oparin, Gennady
Format	Journal Article
Language	English
Published	Elsevier B.V 2019
Subjects	fault-tolerance Grid large-scale problems multi-agent management workflow large-scale problems multi-agent management workflow fault-tolerance Grid
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Nowadays, a provision of the computational process fault-tolerance in Grid is a relevant issue. In the paper, we address a fault-tolerance improvement in solving large-scale scientific and applied problems that are implemented through modular programming in heterogeneous distributed computing environments. We describe a computational process by an abstract program (problem-solving scheme) that correlates to a workflow. The problem-solving scheme specifies modules (applied software) and their relations with each other. This paper proposes a new multi-agent algorithm for re-allocating Grid-resources when the computational process fails. The algorithm execution involves forming a residual problem-solving scheme using methods of the abstract program specialization and reallocating its modules between agents that represent computational resources. In comparison to the known algorithms for the same purpose, the proposed algorithm implements an adaptive multi-scenario solving this issue and therefore increases a degree of computational process fault-tolerance. Extensive modeling and practical experiments demonstrate the practicability of the proposed algorithm.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2019.02.034