Multi-agent Algorithm for Re-allocating Grid-resources and Improving Fault-tolerance of Problem-solving Processes
Nowadays, a provision of the computational process fault-tolerance in Grid is a relevant issue. In the paper, we address a fault-tolerance improvement in solving large-scale scientific and applied problems that are implemented through modular programming in heterogeneous distributed computing enviro...
Saved in:
Published in | Procedia computer science Vol. 150; pp. 171 - 178 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Nowadays, a provision of the computational process fault-tolerance in Grid is a relevant issue. In the paper, we address a fault-tolerance improvement in solving large-scale scientific and applied problems that are implemented through modular programming in heterogeneous distributed computing environments. We describe a computational process by an abstract program (problem-solving scheme) that correlates to a workflow. The problem-solving scheme specifies modules (applied software) and their relations with each other. This paper proposes a new multi-agent algorithm for re-allocating Grid-resources when the computational process fails. The algorithm execution involves forming a residual problem-solving scheme using methods of the abstract program specialization and reallocating its modules between agents that represent computational resources. In comparison to the known algorithms for the same purpose, the proposed algorithm implements an adaptive multi-scenario solving this issue and therefore increases a degree of computational process fault-tolerance. Extensive modeling and practical experiments demonstrate the practicability of the proposed algorithm. |
---|---|
ISSN: | 1877-0509 1877-0509 |
DOI: | 10.1016/j.procs.2019.02.034 |