KORDI: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming

Apache Spark is one of the most commonly used frameworks for Big Data processing. Research on the provided streaming dynamic resource allocation feature, has been shown that large data load fluctuations, for instance, in website traffic, have a negative impact on the automatic scaling. Research has...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 337 - 339
Main Authors Kordelas, Athanasios, Spyrou, Thanasis, Voulgaris, Spyros, Megalooikonomou, Vasileios, Deligiannis, Nikos
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2023
Subjects
Online AccessGet full text
DOI10.1109/ISPASS57527.2023.00045

Cover

More Information
Summary:Apache Spark is one of the most commonly used frameworks for Big Data processing. Research on the provided streaming dynamic resource allocation feature, has been shown that large data load fluctuations, for instance, in website traffic, have a negative impact on the automatic scaling. Research has also indicated that the lack of data load prediction, which aims at the identification of the expected data load increase on peak hours/days, is the root cause of the aforementioned issue. Hence, this paper proposes an enhanced solution, namely, KORDI (Knowledge-based Orchestrated Resource Distribution), aiming at optimising the allocation of Spark resources on Streaming applications in real time with the use of SARIMAX model. The experimental evaluation proves that the proposed solution provides a cost reduction of 38% without affecting stability.
DOI:10.1109/ISPASS57527.2023.00045