BitMatcher: Bit-level Counter Adjustment for Sketches

Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE 40th International Conference on Data Engineering (ICDE) pp. 4815 - 4827
Main Authors Shi, Qilong, Jia, Chengjun, Li, Wenjun, Liu, Zaoxing, Yang, Tong, Ji, Jianan, Xie, Gaogang, Zhang, Weizhe, Yu, Minlan
Format Conference Proceeding
LanguageEnglish
Published IEEE 13.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to dynamically adjust the counter size that matches the distribution of the data stream. We introduce BitMatcher, a fast global-adjusting algorithm that automatically adjusts the counter to the appropriate size to match the data stream. During stream processing, BitMatcher identifies items hashed into a bucket based on isolated fingerprints. If it overflows, BitMatcher changes the flag bits in the bucket and dynamically increases or shrinks the size of some counters in a fine-grained manner. BitMatcher can also relocate a cold item in the bucket with the idea of cuckoo hashing to preserve the potential hot item while achieving global load balancing. Through the above way of dealing with overflow caused by skewed data, BitMatcher precisely manipulates allocated bits and maximizes memory utilization. The experiments show that BitMatcher has high throughput and can outperform SOTA by up to 4 orders of magnitude in terms of accuracy. We also deployed BitMatcher on several platforms, showing its software and hardware scalability.
AbstractList Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to dynamically adjust the counter size that matches the distribution of the data stream. We introduce BitMatcher, a fast global-adjusting algorithm that automatically adjusts the counter to the appropriate size to match the data stream. During stream processing, BitMatcher identifies items hashed into a bucket based on isolated fingerprints. If it overflows, BitMatcher changes the flag bits in the bucket and dynamically increases or shrinks the size of some counters in a fine-grained manner. BitMatcher can also relocate a cold item in the bucket with the idea of cuckoo hashing to preserve the potential hot item while achieving global load balancing. Through the above way of dealing with overflow caused by skewed data, BitMatcher precisely manipulates allocated bits and maximizes memory utilization. The experiments show that BitMatcher has high throughput and can outperform SOTA by up to 4 orders of magnitude in terms of accuracy. We also deployed BitMatcher on several platforms, showing its software and hardware scalability.
Author Jia, Chengjun
Yang, Tong
Zhang, Weizhe
Li, Wenjun
Shi, Qilong
Liu, Zaoxing
Xie, Gaogang
Yu, Minlan
Ji, Jianan
Author_xml – sequence: 1
  givenname: Qilong
  surname: Shi
  fullname: Shi, Qilong
  organization: Tsinghua University Peng Cheng Laboratory
– sequence: 2
  givenname: Chengjun
  surname: Jia
  fullname: Jia, Chengjun
  organization: Tsinghua University
– sequence: 3
  givenname: Wenjun
  surname: Li
  fullname: Li, Wenjun
  organization: Peng Cheng Laboratory Harvard University
– sequence: 4
  givenname: Zaoxing
  surname: Liu
  fullname: Liu, Zaoxing
  organization: University of Maryland
– sequence: 5
  givenname: Tong
  surname: Yang
  fullname: Yang, Tong
  organization: Peking University Peng Cheng Laboratory
– sequence: 6
  givenname: Jianan
  surname: Ji
  fullname: Ji, Jianan
  organization: Peking University
– sequence: 7
  givenname: Gaogang
  surname: Xie
  fullname: Xie, Gaogang
  organization: Chinese Academy of Sciences
– sequence: 8
  givenname: Weizhe
  surname: Zhang
  fullname: Zhang, Weizhe
  organization: Peng Cheng Laboratory
– sequence: 9
  givenname: Minlan
  surname: Yu
  fullname: Yu, Minlan
  organization: Harvard University
BookMark eNqFir1uwjAURm9RK5WfvAGDXyDpvXZsY7Y2pGoHJhjYoqhc1NCQINtU4u1Jpe58yznS-Sbw2PUdA8wJMyJ0L5_FqjRIuckkyjxDVMY8QOKsWyiNiixpOYKxVFanKM3uGZIQjjjM5UQax6Dfmriu49c3-6UYPG35l1tR9Jcushev--MlxBN3URx6LzY__HcNM3g61G3g5J9TmL-X2-IjbZi5OvvmVPtrRaidtQtSd_INLeI5ow
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICDE60146.2024.00366
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350317152
EISSN 2375-026X
EndPage 4827
ExternalDocumentID 10597781
Genre orig-research
GrantInformation_xml – fundername: China Postdoctoral Science Foundation
  grantid: 2020TQ0158,2020M682825
  funderid: 10.13039/501100002858
– fundername: National Key Research and Development Program of China
  grantid: 2022ZD0115303
  funderid: 10.13039/501100012166
– fundername: National Natural Science Foundation of China
  grantid: 62102203,U20A20179,62372009,62072430
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IH
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-ieee_primary_105977813
IEDL.DBID RIE
IngestDate Wed Jul 31 06:02:02 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_105977813
ParticipantIDs ieee_primary_10597781
PublicationCentury 2000
PublicationDate 2024-May-13
PublicationDateYYYYMMDD 2024-05-13
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-May-13
  day: 13
PublicationDecade 2020
PublicationTitle 2024 IEEE 40th International Conference on Data Engineering (ICDE)
PublicationTitleAbbrev ICDE
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000941150
Score 3.8510768
Snippet Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to...
SourceID ieee
SourceType Publisher
StartPage 4815
SubjectTerms Accuracy
Approximate algorithm
Data stream
Heuristic algorithms
Load management
Memory management
Scalability
Sketch
Software algorithms
Throughput
Title BitMatcher: Bit-level Counter Adjustment for Sketches
URI https://ieeexplore.ieee.org/document/10597781
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwED90Tz7Nj4ofU_Lga7q0SdrVN50bU9gQVNjbaNIUdNKJa1_8682l7RRR8O0IITlyhLtL7vc7gItc6yjMlaQqTlMqJGdU5TKmOpF5MNBMM4HY4eksmjyJu7mcN2B1h4UxxrjiM-Oj6P7ys5Wu8Kmsj7FAHCPQenvAwhqstXlQsXkKRjcNPC5gSf92eDOKkBzFpoEhkmRz5EL81kTF-ZBxF2bt7nXpyNKvSuXrjx_EjP9Wbxe8L7geud84oj3YMsU-dNt-DaS5vgcgr5_LaerMdEmsTF-xYoggLB3nXWUv1doVnRMbyZKHpbPo2oPeePQ4nFBUZfFWs1MsWi34IXSKVWGOgNgszuRMJcJkXCgeppHSYWoDP5seJlrqY_B-XeLkj_FT2METxU_0gPegU75X5sz65lKdO5t8AjcAkhI
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwED9kPujT_Kj4MTUPvrbrmqa1vunc6HQtghP2Vpo0BZ104toX_3pzaTtFFHwLSUiOHOHukvv9DuAiF8Jzcs5M7qep6TJqmzxnvikClg8uhS1sF7HDUeyFT-7dnM0bsLrGwkgpdfKZtLCp__KzpajwqayPvoDvI9B6Uxl6O6jhWusnFRWpoH_TAOTUeH8yvB15SI-iAkEHabIpsiF-K6Oirci4C3G7f508srCqklvi4wc1478F3AHjC7BHHtamaBc2ZLEH3bZiA2ku8D6wm-cySrWirohqm6-YM0QQmI7zrrOXaqXTzonyZcnjQut0ZUBvPJoNQxNFSd5qfoqklYIeQKdYFvIQiIrjZG7zwJUZdTl1Uo8LJ1WunwoQA8HEERi_LnH8R_85bIWzaJpMJ_H9CWzj6eKX-oD2oFO-V_JUWeqSn2n9fAIeY5Vo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+40th+International+Conference+on+Data+Engineering+%28ICDE%29&rft.atitle=BitMatcher%3A+Bit-level+Counter+Adjustment+for+Sketches&rft.au=Shi%2C+Qilong&rft.au=Jia%2C+Chengjun&rft.au=Li%2C+Wenjun&rft.au=Liu%2C+Zaoxing&rft.date=2024-05-13&rft.pub=IEEE&rft.eissn=2375-026X&rft.spage=4815&rft.epage=4827&rft_id=info:doi/10.1109%2FICDE60146.2024.00366&rft.externalDocID=10597781