Efficient iterative programs with distributed data collections

Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order t...

Full description

Saved in:
Bibliographic Details
Published inJournal of logical and algebraic methods in programming Vol. 144; pp. 101047 - 36
Main Authors Chlyah, Sarah, Gesbert, Nils, Genevès, Pierre, Layaïda, Nabil
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.03.2025
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations.
ISSN:2352-2208
DOI:10.1016/j.jlamp.2025.101047