Spark SQL Shuffle task number optimization system based on historical information
The invention discloses a Spark SQL Shuffle task number optimization system based on historical information, and relates to the field of big data, databases and machine learning, the Spark SQL Shuffle task number optimization system comprises an SQL historical operation information extraction module...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
05.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention discloses a Spark SQL Shuffle task number optimization system based on historical information, and relates to the field of big data, databases and machine learning, the Spark SQL Shuffle task number optimization system comprises an SQL historical operation information extraction module, an SQL historical operation information pre-analysis module, an SQL similarity measurement module, an HBO parameter calculation module and an HBO parameter recommendation service module; a recommendation model based on historical information is introduced into a Spark SQL engine, a current SQL is guided to run more efficiently and stably by analyzing shuffle running information of historical SQL and a machine learning algorithm, tuning parameters are calculated, the task number of each shuffle stage is recommended, dynamic self-adaptive shuffle task number setting is achieved, and a series of problems caused by the static shuffle task number are avoided.
本发明公开了一种基于历史信息的Spark SQL Shuffle任务数优化系统,涉及大数据、数据库以及机器学习领域,包 |
---|---|
Bibliography: | Application Number: CN202410013742 |