Effective computation of biased quantiles over data streams
Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more u...
Saved in:
Published in | 21st International Conference on Data Engineering (ICDE'05) pp. 20 - 31 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
2005
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over highspeed data streams. |
---|---|
AbstractList | Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over highspeed data streams. |
Author | Srivastava, D. Korn, F. Muthukrishnan, S. Cormode, G. |
Author_xml | – sequence: 1 givenname: G. surname: Cormode fullname: Cormode, G. organization: Lucent Technol. Bell Labs., PA, USA – sequence: 2 givenname: F. surname: Korn fullname: Korn, F. – sequence: 3 givenname: S. surname: Muthukrishnan fullname: Muthukrishnan, S. – sequence: 4 givenname: D. surname: Srivastava fullname: Srivastava, D. |
BookMark | eNotzEFLwzAYgOGgE9zmbt685A-0fl_SpAmeZFYdDLwoeBtp-gUiazubbOC_V5jv5bm9CzYbxoEYu0UoEcHeb9ZPTSkAVKnUBZsLWasChP68ZAuotVVCGGVmbI6gZaGlEddsldIX_GUrRAVz9tCEQD7HE3E_9odjdjmOAx8Db6NL1PHvoxty3FPi44km3rnseMoTuT7dsKvg9olW_y7Zx3Pzvn4ttm8vm_XjtogCZS6EBW18FVBD66VvkTTVHVCLNcrgnW2drsCE1li0HoxVmgx0CqT2gaSSS3Z3_kYi2h2m2LvpZ4cVAoKUvy3ZSm8 |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICDE.2005.55 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Statistics Computer Science |
EISSN | 2375-026X |
EndPage | 31 |
ExternalDocumentID | 1410103 |
Genre | orig-research |
GroupedDBID | 6IE 6IH CBEJK RIE RIO |
ID | FETCH-LOGICAL-i213t-29068c4f160bc3cb1e6e7d0eb1713fca9ba6408fb8919c08956e80d5036cfe353 |
IEDL.DBID | RIE |
ISBN | 0769522858 9780769522852 |
ISSN | 1063-6382 |
IngestDate | Wed Jun 26 19:21:17 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i213t-29068c4f160bc3cb1e6e7d0eb1713fca9ba6408fb8919c08956e80d5036cfe353 |
OpenAccessLink | http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf |
PageCount | 12 |
ParticipantIDs | ieee_primary_1410103 |
PublicationCentury | 2000 |
PublicationDate | 20050000 |
PublicationDateYYYYMMDD | 2005-01-01 |
PublicationDate_xml | – year: 2005 text: 20050000 |
PublicationDecade | 2000 |
PublicationTitle | 21st International Conference on Data Engineering (ICDE'05) |
PublicationTitleAbbrev | ICDE |
PublicationYear | 2005 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000941150 ssj0000451731 |
Score | 1.8128089 |
Snippet | Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 20 |
SubjectTerms | Displays Distributed computing Fluid flow measurement Probability distribution Statistics TCPIP Telecommunication traffic Time measurement |
Title | Effective computation of biased quantiles over data streams |
URI | https://ieeexplore.ieee.org/document/1410103 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKp06Ftoi3PDDi1knsxBZjoSpIRQxU6lbFzkUqiBZosvDrOTtpQIiBLa8hsS_3_O47Qi41tyrRFhjYNGHCGsVMKEOG4bWSWRCnwtMXzx7i6VzcL-SiRa6aXhgA8OAzGLpDX8vPNrZ0qbKRwyQGjtpzT_Gw6tVq8imOJyWpK1rPFWLOOTu-2BlHDMWsjto1ehxKqpp8Z3ceNqB4Pbob39xW2RbXAPhj6Iq3OZMume3etoKavAzLwgzt5y8ix_9-zj4ZfHf30cfGbh2QFqx7pLsb70Drv71HOs4RrXic--S6ojlG3Uitf9LvKN3k1KzQEGb0vcQ9QhWzpQ4USh3ylLpGlPR1OyDzye3TeMrqwQtsFQZRwRwFvLIiD2JubGRNADEkGUe1jiFtblNt0lhwlRulA225whgLFM8kWkObQySjQ9Jeb9ZwRKiSSWYFxiQRV8KV9FA0QGYaEmPxXnhM-m5hlm8Vt8ayXpOTvy-fko6nTvUpkDPSLj5KOEenoDAXXhq-AB0krcM |
link.rule.ids | 310,311,786,790,795,796,802,4069,4070,27956,55107 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4QPMgJBYxv9-DRhe17Nx5RAgrEAyTcSHe7TdAIKu3FX-_stlRjPHjr69DuTuf5zTcA14IpHgmlqVZxRH0lOZVu4FIMr3mQOGHsW_riyTQczv2HRbCowU3VC6O1tuAz3TWHtpafbFRuUmU9g0l0DLXnHtp5FhXdWlVGxTClRGVN67nAzBl3x5Y7Q4-ioJVxu0Cfgwe8pN_ZnbsVLF70Rv27-yLfYloAf4xdsVZn0ITJ7n0LsMlLN89kV33-onL87wcdQOe7v488VZbrEGp63YLmbsADKf_3FjSMK1owObfhtiA6Ru1IlH3S7inZpESu0BQm5D3HXUIlsyUGFkoM9pSYVpT4dduB-eB-1h_ScvQCXbmOl1FDAs-Vnzohk8pT0tGhjhKGih2D2lTFQsahz3gquXCEYhyjLM1ZEqA9VKn2Au8I6uvNWh8D4UGUKB-jEo9x3xT1UDh0kAgdSYX33BNom4VZvhXsGstyTU7_vnwF-8PZZLwcj6aPZ9CwRKo2IXIO9ewj1xfoImTy0krGF354sRc |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=21st+International+Conference+on+Data+Engineering+%28ICDE%2705%29&rft.atitle=Effective+computation+of+biased+quantiles+over+data+streams&rft.au=Cormode%2C+G.&rft.au=Korn%2C+F.&rft.au=Muthukrishnan%2C+S.&rft.au=Srivastava%2C+D.&rft.date=2005-01-01&rft.pub=IEEE&rft.isbn=9780769522852&rft.issn=1063-6382&rft.eissn=2375-026X&rft.spage=20&rft.epage=31&rft_id=info:doi/10.1109%2FICDE.2005.55&rft.externalDocID=1410103 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6382&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6382&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6382&client=summon |