Spark SQL Shuffle task number optimization system based on historical information

The invention discloses a Spark SQL Shuffle task number optimization system based on historical information, and relates to the field of big data, databases and machine learning, the Spark SQL Shuffle task number optimization system comprises an SQL historical operation information extraction module...

Full description

Saved in:

Bibliographic Details
Main Authors	ZHAO ZHIFENG, CAO JUNLIANG, LONG YILIN, WANG YONGQIANG, WANG XIAODONG, CHANG YI, XIA JUNSHENG
Format	Patent
Language	Chinese English
Published	05.04.2024
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The invention discloses a Spark SQL Shuffle task number optimization system based on historical information, and relates to the field of big data, databases and machine learning, the Spark SQL Shuffle task number optimization system comprises an SQL historical operation information extraction module, an SQL historical operation information pre-analysis module, an SQL similarity measurement module, an HBO parameter calculation module and an HBO parameter recommendation service module; a recommendation model based on historical information is introduced into a Spark SQL engine, a current SQL is guided to run more efficiently and stably by analyzing shuffle running information of historical SQL and a machine learning algorithm, tuning parameters are calculated, the task number of each shuffle stage is recommended, dynamic self-adaptive shuffle task number setting is achieved, and a series of problems caused by the static shuffle task number are avoided. 本发明公开了一种基于历史信息的Spark SQL Shuffle任务数优化系统，涉及大数据、数据库以及机器学习领域，包
Bibliography:	Application Number: CN202410013742