A Hadoop configuration optimization method based on middle platform business operation requirements

The middle platform business is the key infrastructure for the digital transformation of the power grid. The big data analysis and processing framework driven by the middle platform business has a wide variety of configuration parameters, complex meanings and interrelated effects. It is difficult to...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE) pp. 1211 - 1216
Main Authors Huang, Jingzhi, Huang, Xiaoqiang, Cui, Yan, Ao, Zhiqi
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.08.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The middle platform business is the key infrastructure for the digital transformation of the power grid. The big data analysis and processing framework driven by the middle platform business has a wide variety of configuration parameters, complex meanings and interrelated effects. It is difficult to achieve fast and accurate optimization and improve middle platform business processing performance. This paper proposed an effective hadoop configuration parameter tuning method. First, the K-means clustering algorithm is used to classify the resource utilization characteristics of middle platform business operations; Then, a performance model of middle platform business operations was constructed based on the random forest algorithm; Finally, it finds the optimal configuration of the parameter configuration space. Experiment results show that the performance model based on random forest can effectively predict the running time of the application. It can achieve the effect of optimizing execution efficiency with the optimization of parameters based on the performance model .
DOI:10.1109/ICSECE58870.2023.10263541