The Heuristic Algorithm For Symmetric Horizontal Data Distribution

The article considers one algorithm for the optimal distribution of "objects" of an arbitrary nature among "storages", the essence of which is determined by the subject area. Some subject areas for which the optimal distribution problem is relevant are considered. Authors conside...

Full description

Saved in:

Bibliographic Details
Published in	IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference pp. 2161 - 2165
Main Authors	Munerman, Victor, Munerman, Daniel, Samoilova, Tatyana
Format	Conference Proceeding
Language	English
Published	IEEE 26.01.2021
Subjects	Big Data Computational complexity Computer languages Data warehouses heuristic algorithm Heuristic algorithms Linear programming optimal distribution parallel programming Program processors
Online Access	Get full text
ISSN	2376-6565
DOI	10.1109/ElConRus51938.2021.9396510

Cover

Loading…

More Information
Summary:	The article considers one algorithm for the optimal distribution of "objects" of an arbitrary nature among "storages", the essence of which is determined by the subject area. Some subject areas for which the optimal distribution problem is relevant are considered. Authors considers the problem of accelerating of the Join operation is considered. In the case of big data parallel processing, the Join operation requires uniform distribution of data between the cluster processors. In this case, parallel implementation of the Join operation will be effective only when the computational complexities of its execution in all database fragments will be minimally different from each other. The optimality criterion should ensure uniform distribution of data. A detailed description of the heuristic optimal distribution algorithm is given. Objective functions for the problems under consideration are proposed. A description is given of the experiments that made it possible to assess the quality of the heuristic greedy optimal distribution algorithm. As a result of these experiments, the dependences of the execution time of the algorithm on the number of distributed objects and the quality of distribution (the difference between the maximum and minimum storage capacity) on the number of stores and the interval of the values of the objects weight. It is shown that the algorithm is quite simple and can be easily implemented in any programming language. The running time of the algorithm, even for big data, is small, which allows it to be effectively used in the preparation of data for parallel solving problems with high computational complexity. The algorithm shows good results when distributing ables-operands across data warehouses. The largest storage capacity differs from the smallest by a small amount.
ISSN:	2376-6565
DOI:	10.1109/ElConRus51938.2021.9396510