Cardinality Estimation for Elephant Flows: A Compact Solution Based on Virtual Register Sharing

For many practical applications, it is a fundamental problem to estimate the flow cardinalities over big network data consisting of numerous flows (especially a large quantity of mouse flows mixed with a small number of elephant flows, whose cardinalities follow a power-law distribution). Traditiona...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on networking Vol. 25; no. 6; pp. 3738 - 3752
Main Authors Xiao, Qingjun, Chen, Shigang, Zhou, You, Chen, Min, Luo, Junzhou, Li, Tengli, Ling, Yibei
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:For many practical applications, it is a fundamental problem to estimate the flow cardinalities over big network data consisting of numerous flows (especially a large quantity of mouse flows mixed with a small number of elephant flows, whose cardinalities follow a power-law distribution). Traditionally the research on this problem focused on using a small amount of memory to estimate each flow's cardinality from a large range (up to {10}^{{9}} ). However, although the memory needed for each individual flow has been greatly compressed, when there is an extremely large number of flows, the overall memory demand can still be very high, exceeding the availability under some important scenarios, such as implementing online measurement modules in network processors using only on-chip cache memory. In this paper, instead of allocating a separated data structure (called estimator) for each flow, we take a different path by viewing all the flows together as a whole: Each flow is allocated with a virtual estimator, and these virtual estimators share a common memory space. We discover that sharing at the multi-bit register level is superior than sharing at the bit level. We propose a unified framework of virtual estimators that allows us to apply the idea of sharing to an array of cardinality estimation solutions, e.g., HyperLogLog and PCSA, achieving far better memory efficiency than the best existing work. Our experiment shows that the new solution can work in a tight memory space of less than 1 bit per flow or even one tenth of a bit per flow - a quest that has never been realized before.
ISSN:1063-6692
1558-2566
DOI:10.1109/TNET.2017.2753842