BitMatcher: Bit-level Counter Adjustment for Sketches
Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to...
Saved in:
Published in | 2024 IEEE 40th International Conference on Data Engineering (ICDE) pp. 4815 - 4827 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
13.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to dynamically adjust the counter size that matches the distribution of the data stream. We introduce BitMatcher, a fast global-adjusting algorithm that automatically adjusts the counter to the appropriate size to match the data stream. During stream processing, BitMatcher identifies items hashed into a bucket based on isolated fingerprints. If it overflows, BitMatcher changes the flag bits in the bucket and dynamically increases or shrinks the size of some counters in a fine-grained manner. BitMatcher can also relocate a cold item in the bucket with the idea of cuckoo hashing to preserve the potential hot item while achieving global load balancing. Through the above way of dealing with overflow caused by skewed data, BitMatcher precisely manipulates allocated bits and maximizes memory utilization. The experiments show that BitMatcher has high throughput and can outperform SOTA by up to 4 orders of magnitude in terms of accuracy. We also deployed BitMatcher on several platforms, showing its software and hardware scalability. |
---|---|
AbstractList | Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to dynamically adjust the counter size that matches the distribution of the data stream. We introduce BitMatcher, a fast global-adjusting algorithm that automatically adjusts the counter to the appropriate size to match the data stream. During stream processing, BitMatcher identifies items hashed into a bucket based on isolated fingerprints. If it overflows, BitMatcher changes the flag bits in the bucket and dynamically increases or shrinks the size of some counters in a fine-grained manner. BitMatcher can also relocate a cold item in the bucket with the idea of cuckoo hashing to preserve the potential hot item while achieving global load balancing. Through the above way of dealing with overflow caused by skewed data, BitMatcher precisely manipulates allocated bits and maximizes memory utilization. The experiments show that BitMatcher has high throughput and can outperform SOTA by up to 4 orders of magnitude in terms of accuracy. We also deployed BitMatcher on several platforms, showing its software and hardware scalability. |
Author | Jia, Chengjun Yang, Tong Zhang, Weizhe Li, Wenjun Shi, Qilong Liu, Zaoxing Xie, Gaogang Yu, Minlan Ji, Jianan |
Author_xml | – sequence: 1 givenname: Qilong surname: Shi fullname: Shi, Qilong organization: Tsinghua University Peng Cheng Laboratory – sequence: 2 givenname: Chengjun surname: Jia fullname: Jia, Chengjun organization: Tsinghua University – sequence: 3 givenname: Wenjun surname: Li fullname: Li, Wenjun organization: Peng Cheng Laboratory Harvard University – sequence: 4 givenname: Zaoxing surname: Liu fullname: Liu, Zaoxing organization: University of Maryland – sequence: 5 givenname: Tong surname: Yang fullname: Yang, Tong organization: Peking University Peng Cheng Laboratory – sequence: 6 givenname: Jianan surname: Ji fullname: Ji, Jianan organization: Peking University – sequence: 7 givenname: Gaogang surname: Xie fullname: Xie, Gaogang organization: Chinese Academy of Sciences – sequence: 8 givenname: Weizhe surname: Zhang fullname: Zhang, Weizhe organization: Peng Cheng Laboratory – sequence: 9 givenname: Minlan surname: Yu fullname: Yu, Minlan organization: Harvard University |
BookMark | eNqFir1uwjAURm9RK5WfvAGDXyDpvXZsY7Y2pGoHJhjYoqhc1NCQINtU4u1Jpe58yznS-Sbw2PUdA8wJMyJ0L5_FqjRIuckkyjxDVMY8QOKsWyiNiixpOYKxVFanKM3uGZIQjjjM5UQax6Dfmriu49c3-6UYPG35l1tR9Jcushev--MlxBN3URx6LzY__HcNM3g61G3g5J9TmL-X2-IjbZi5OvvmVPtrRaidtQtSd_INLeI5ow |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICDE60146.2024.00366 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9798350317152 |
EISSN | 2375-026X |
EndPage | 4827 |
ExternalDocumentID | 10597781 |
Genre | orig-research |
GrantInformation_xml | – fundername: China Postdoctoral Science Foundation grantid: 2020TQ0158,2020M682825 funderid: 10.13039/501100002858 – fundername: National Key Research and Development Program of China grantid: 2022ZD0115303 funderid: 10.13039/501100012166 – fundername: National Natural Science Foundation of China grantid: 62102203,U20A20179,62372009,62072430 funderid: 10.13039/501100001809 |
GroupedDBID | 6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-ieee_primary_105977813 |
IEDL.DBID | RIE |
IngestDate | Wed Jul 31 06:02:02 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-ieee_primary_105977813 |
ParticipantIDs | ieee_primary_10597781 |
PublicationCentury | 2000 |
PublicationDate | 2024-May-13 |
PublicationDateYYYYMMDD | 2024-05-13 |
PublicationDate_xml | – month: 05 year: 2024 text: 2024-May-13 day: 13 |
PublicationDecade | 2020 |
PublicationTitle | 2024 IEEE 40th International Conference on Data Engineering (ICDE) |
PublicationTitleAbbrev | ICDE |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000941150 |
Score | 3.8510768 |
Snippet | Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 4815 |
SubjectTerms | Accuracy Approximate algorithm Data stream Heuristic algorithms Load management Memory management Scalability Sketch Software algorithms Throughput |
Title | BitMatcher: Bit-level Counter Adjustment for Sketches |
URI | https://ieeexplore.ieee.org/document/10597781 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwED90Tz7Nj4ofU_Lga7q0SdrVN50bU9gQVNjbaNIUdNKJa1_8682l7RRR8O0IITlyhLtL7vc7gItc6yjMlaQqTlMqJGdU5TKmOpF5MNBMM4HY4eksmjyJu7mcN2B1h4UxxrjiM-Oj6P7ys5Wu8Kmsj7FAHCPQenvAwhqstXlQsXkKRjcNPC5gSf92eDOKkBzFpoEhkmRz5EL81kTF-ZBxF2bt7nXpyNKvSuXrjx_EjP9Wbxe8L7geud84oj3YMsU-dNt-DaS5vgcgr5_LaerMdEmsTF-xYoggLB3nXWUv1doVnRMbyZKHpbPo2oPeePQ4nFBUZfFWs1MsWi34IXSKVWGOgNgszuRMJcJkXCgeppHSYWoDP5seJlrqY_B-XeLkj_FT2METxU_0gPegU75X5sz65lKdO5t8AjcAkhI |
link.rule.ids | 310,311,786,790,795,796,802,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwED9kPujT_Kj4MTUPvrbrmqa1vunc6HQtghP2Vpo0BZ104toX_3pzaTtFFHwLSUiOHOHukvv9DuAiF8Jzcs5M7qep6TJqmzxnvikClg8uhS1sF7HDUeyFT-7dnM0bsLrGwkgpdfKZtLCp__KzpajwqayPvoDvI9B6Uxl6O6jhWusnFRWpoH_TAOTUeH8yvB15SI-iAkEHabIpsiF-K6Oirci4C3G7f508srCqklvi4wc1478F3AHjC7BHHtamaBc2ZLEH3bZiA2ku8D6wm-cySrWirohqm6-YM0QQmI7zrrOXaqXTzonyZcnjQut0ZUBvPJoNQxNFSd5qfoqklYIeQKdYFvIQiIrjZG7zwJUZdTl1Uo8LJ1WunwoQA8HEERi_LnH8R_85bIWzaJpMJ_H9CWzj6eKX-oD2oFO-V_JUWeqSn2n9fAIeY5Vo |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+40th+International+Conference+on+Data+Engineering+%28ICDE%29&rft.atitle=BitMatcher%3A+Bit-level+Counter+Adjustment+for+Sketches&rft.au=Shi%2C+Qilong&rft.au=Jia%2C+Chengjun&rft.au=Li%2C+Wenjun&rft.au=Liu%2C+Zaoxing&rft.date=2024-05-13&rft.pub=IEEE&rft.eissn=2375-026X&rft.spage=4815&rft.epage=4827&rft_id=info:doi/10.1109%2FICDE60146.2024.00366&rft.externalDocID=10597781 |