Effective computation of biased quantiles over data streams

Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more u...

Full description

Saved in:
Bibliographic Details
Published in21st International Conference on Data Engineering (ICDE'05) pp. 20 - 31
Main Authors Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2005
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over highspeed data streams.
AbstractList Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over highspeed data streams.
Author Srivastava, D.
Korn, F.
Muthukrishnan, S.
Cormode, G.
Author_xml – sequence: 1
  givenname: G.
  surname: Cormode
  fullname: Cormode, G.
  organization: Lucent Technol. Bell Labs., PA, USA
– sequence: 2
  givenname: F.
  surname: Korn
  fullname: Korn, F.
– sequence: 3
  givenname: S.
  surname: Muthukrishnan
  fullname: Muthukrishnan, S.
– sequence: 4
  givenname: D.
  surname: Srivastava
  fullname: Srivastava, D.
BookMark eNotzEFLwzAYgOGgE9zmbt685A-0fl_SpAmeZFYdDLwoeBtp-gUiazubbOC_V5jv5bm9CzYbxoEYu0UoEcHeb9ZPTSkAVKnUBZsLWasChP68ZAuotVVCGGVmbI6gZaGlEddsldIX_GUrRAVz9tCEQD7HE3E_9odjdjmOAx8Db6NL1PHvoxty3FPi44km3rnseMoTuT7dsKvg9olW_y7Zx3Pzvn4ttm8vm_XjtogCZS6EBW18FVBD66VvkTTVHVCLNcrgnW2drsCE1li0HoxVmgx0CqT2gaSSS3Z3_kYi2h2m2LvpZ4cVAoKUvy3ZSm8
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICDE.2005.55
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Computer Science
EISSN 2375-026X
EndPage 31
ExternalDocumentID 1410103
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-i213t-29068c4f160bc3cb1e6e7d0eb1713fca9ba6408fb8919c08956e80d5036cfe353
IEDL.DBID RIE
ISBN 0769522858
9780769522852
ISSN 1063-6382
IngestDate Wed Jun 26 19:21:17 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i213t-29068c4f160bc3cb1e6e7d0eb1713fca9ba6408fb8919c08956e80d5036cfe353
OpenAccessLink http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf
PageCount 12
ParticipantIDs ieee_primary_1410103
PublicationCentury 2000
PublicationDate 20050000
PublicationDateYYYYMMDD 2005-01-01
PublicationDate_xml – year: 2005
  text: 20050000
PublicationDecade 2000
PublicationTitle 21st International Conference on Data Engineering (ICDE'05)
PublicationTitleAbbrev ICDE
PublicationYear 2005
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000941150
ssj0000451731
Score 1.8128089
Snippet Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles...
SourceID ieee
SourceType Publisher
StartPage 20
SubjectTerms Displays
Distributed computing
Fluid flow measurement
Probability distribution
Statistics
TCPIP
Telecommunication traffic
Time measurement
Title Effective computation of biased quantiles over data streams
URI https://ieeexplore.ieee.org/document/1410103
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKp06Ftoi3PDDi1knsxBZjoSpIRQxU6lbFzkUqiBZosvDrOTtpQIiBLa8hsS_3_O47Qi41tyrRFhjYNGHCGsVMKEOG4bWSWRCnwtMXzx7i6VzcL-SiRa6aXhgA8OAzGLpDX8vPNrZ0qbKRwyQGjtpzT_Gw6tVq8imOJyWpK1rPFWLOOTu-2BlHDMWsjto1ehxKqpp8Z3ceNqB4Pbob39xW2RbXAPhj6Iq3OZMume3etoKavAzLwgzt5y8ix_9-zj4ZfHf30cfGbh2QFqx7pLsb70Drv71HOs4RrXic--S6ojlG3Uitf9LvKN3k1KzQEGb0vcQ9QhWzpQ4USh3ylLpGlPR1OyDzye3TeMrqwQtsFQZRwRwFvLIiD2JubGRNADEkGUe1jiFtblNt0lhwlRulA225whgLFM8kWkObQySjQ9Jeb9ZwRKiSSWYFxiQRV8KV9FA0QGYaEmPxXnhM-m5hlm8Vt8ayXpOTvy-fko6nTvUpkDPSLj5KOEenoDAXXhq-AB0krcM
link.rule.ids 310,311,786,790,795,796,802,4069,4070,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4QPMgJBYxv9-DRhe17Nx5RAgrEAyTcSHe7TdAIKu3FX-_stlRjPHjr69DuTuf5zTcA14IpHgmlqVZxRH0lOZVu4FIMr3mQOGHsW_riyTQczv2HRbCowU3VC6O1tuAz3TWHtpafbFRuUmU9g0l0DLXnHtp5FhXdWlVGxTClRGVN67nAzBl3x5Y7Q4-ioJVxu0Cfgwe8pN_ZnbsVLF70Rv27-yLfYloAf4xdsVZn0ITJ7n0LsMlLN89kV33-onL87wcdQOe7v488VZbrEGp63YLmbsADKf_3FjSMK1owObfhtiA6Ru1IlH3S7inZpESu0BQm5D3HXUIlsyUGFkoM9pSYVpT4dduB-eB-1h_ScvQCXbmOl1FDAs-Vnzohk8pT0tGhjhKGih2D2lTFQsahz3gquXCEYhyjLM1ZEqA9VKn2Au8I6uvNWh8D4UGUKB-jEo9x3xT1UDh0kAgdSYX33BNom4VZvhXsGstyTU7_vnwF-8PZZLwcj6aPZ9CwRKo2IXIO9ewj1xfoImTy0krGF354sRc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=21st+International+Conference+on+Data+Engineering+%28ICDE%2705%29&rft.atitle=Effective+computation+of+biased+quantiles+over+data+streams&rft.au=Cormode%2C+G.&rft.au=Korn%2C+F.&rft.au=Muthukrishnan%2C+S.&rft.au=Srivastava%2C+D.&rft.date=2005-01-01&rft.pub=IEEE&rft.isbn=9780769522852&rft.issn=1063-6382&rft.eissn=2375-026X&rft.spage=20&rft.epage=31&rft_id=info:doi/10.1109%2FICDE.2005.55&rft.externalDocID=1410103
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6382&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6382&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6382&client=summon