Pebbles: Leveraging Sketches for Processing Voluminous, High Velocity Data Streams
Voluminous, time-series data streams originating in continuous sensing environments pose data ingestion and processing challenges. We present a holistic methodology centered around data sketching to address both challenges. We introduce an order-preserving sketching algorithm that we have designed f...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 32; no. 8; pp. 2005 - 2020 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.08.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Voluminous, time-series data streams originating in continuous sensing environments pose data ingestion and processing challenges. We present a holistic methodology centered around data sketching to address both challenges. We introduce an order-preserving sketching algorithm that we have designed for space-efficient representation of multi-feature streams with native support for stream processing related operations. Observational streams are preprocessed at the edges of the network generating sketched streams to reduce data transfer costs and energy consumption. Ingested sketched streams are then processed using sketch-aware extensions to existing stream processing APIs delivering improved performance. Our benchmarks with real-world datasets show up to a <inline-formula><tex-math notation="LaTeX">\sim 8\times</tex-math> <mml:math><mml:mrow><mml:mo>∼</mml:mo><mml:mn>8</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="buddhika-ieq1-3055265.gif"/> </inline-formula> reduction in data volumes transferred and a <inline-formula><tex-math notation="LaTeX">\sim 27\times</tex-math> <mml:math><mml:mrow><mml:mo>∼</mml:mo><mml:mn>27</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="buddhika-ieq2-3055265.gif"/> </inline-formula> improvement in throughput. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2021.3055265 |