Very Fast Streaming Submodular Function Maximization

Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been the focus of summarization algorithms. Submodular function maximization is a well-studied problem with a variety of algorithms availab...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases. Research Track Vol. 12977; pp. 151 - 166
Main Authors Buschjäger, Sebastian, Honysz, Philipp-Jan, Pfahler, Lukas, Morik, Katharina
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been the focus of summarization algorithms. Submodular function maximization is a well-studied problem with a variety of algorithms available. These algorithms usually offer worst-case guarantees to the expense of higher computation and memory requirements. However, many practical applications do not fall under this mathematical worst-case but are usually much more well-behaved. We propose a new submodular function maximization algorithm called ThreeSieves that ignores the worst-case and thus uses fewer resources. Our algorithm selects the most informative items from a data-stream on the fly and maintains a provable performance in most cases on a fixed memory budget. In an extensive evaluation, we compare our method against 6 state-of-the-art algorithms on 8 different datasets including data with and without concept drift. We show that our algorithm outperforms the current state-of-the-art in the majority of cases and, at the same time, uses fewer resources.
ISBN:3030865223
9783030865221
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-86523-8_10