Pebbles: Leveraging Sketches for Processing Voluminous, High Velocity Data Streams

Voluminous, time-series data streams originating in continuous sensing environments pose data ingestion and processing challenges. We present a holistic methodology centered around data sketching to address both challenges. We introduce an order-preserving sketching algorithm that we have designed f...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 32; no. 8; pp. 2005 - 2020
Main Authors Buddhika, Thilina, Pallickara, Sangmi Lee, Pallickara, Shrideep
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Voluminous, time-series data streams originating in continuous sensing environments pose data ingestion and processing challenges. We present a holistic methodology centered around data sketching to address both challenges. We introduce an order-preserving sketching algorithm that we have designed for space-efficient representation of multi-feature streams with native support for stream processing related operations. Observational streams are preprocessed at the edges of the network generating sketched streams to reduce data transfer costs and energy consumption. Ingested sketched streams are then processed using sketch-aware extensions to existing stream processing APIs delivering improved performance. Our benchmarks with real-world datasets show up to a <inline-formula><tex-math notation="LaTeX">\sim 8\times</tex-math> <mml:math><mml:mrow><mml:mo>∼</mml:mo><mml:mn>8</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="buddhika-ieq1-3055265.gif"/> </inline-formula> reduction in data volumes transferred and a <inline-formula><tex-math notation="LaTeX">\sim 27\times</tex-math> <mml:math><mml:mrow><mml:mo>∼</mml:mo><mml:mn>27</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="buddhika-ieq2-3055265.gif"/> </inline-formula> improvement in throughput.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2021.3055265