GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions

We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical contexts such as data s...

Full description

Saved in:

Bibliographic Details
Main Authors	Gopal, Shivaram, Ferdous, S M, Maji, Hemanta K, Pothen, Alex
Format	Journal Article
Language	English
Published	15.03.2024
Subjects	Computer Science - Data Structures and Algorithms Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical contexts such as data summarization, machine learning, and graph sparsification. Our work builds on the randomized distributed RandGreedi algorithm, proposed by Barbosa, Ene, Nguyen, and Ward (2015). This algorithm computes a distributed solution by randomly partitioning the data among all the processors and then employing \emph{a single} accumulation step in which all processors send their partial solutions to one processor. However, for large problems, the accumulation step exceeds the memory available on a processor, and the processor that performs the accumulation becomes a computational bottleneck. Hence we propose a generalization of the RandGreedi algorithm that employs multiple accumulation steps to reduce the memory required. We analyze the approximation ratio and the time complexity of the algorithm (in the BSP model). We evaluate the new GreedyML algorithm on three classes of problems, and report results from large-scale data sets with millions of elements. The results show that the GreedyML algorithm can solve problems where the sequential Greedy and distributed RandGreedi algorithms fail due to memory constraints. For certain computationally intensive problems, the GreedyML algorithm is faster than the RandGreedi algorithm. The observed approximation quality of the solutions computed by the GreedyML algorithm closely matches those obtained by the RandGreedi algorithm on these problems.
DOI:	10.48550/arxiv.2403.10332