BitMatcher: Bit-level Counter Adjustment for Sketches

Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE 40th International Conference on Data Engineering (ICDE) pp. 4815 - 4827
Main Authors Shi, Qilong, Jia, Chengjun, Li, Wenjun, Liu, Zaoxing, Yang, Tong, Ji, Jianan, Xie, Gaogang, Zhang, Weizhe, Yu, Minlan
Format Conference Proceeding
LanguageEnglish
Published IEEE 13.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to dynamically adjust the counter size that matches the distribution of the data stream. We introduce BitMatcher, a fast global-adjusting algorithm that automatically adjusts the counter to the appropriate size to match the data stream. During stream processing, BitMatcher identifies items hashed into a bucket based on isolated fingerprints. If it overflows, BitMatcher changes the flag bits in the bucket and dynamically increases or shrinks the size of some counters in a fine-grained manner. BitMatcher can also relocate a cold item in the bucket with the idea of cuckoo hashing to preserve the potential hot item while achieving global load balancing. Through the above way of dealing with overflow caused by skewed data, BitMatcher precisely manipulates allocated bits and maximizes memory utilization. The experiments show that BitMatcher has high throughput and can outperform SOTA by up to 4 orders of magnitude in terms of accuracy. We also deployed BitMatcher on several platforms, showing its software and hardware scalability.
ISSN:2375-026X
DOI:10.1109/ICDE60146.2024.00366