The Adaptive sampling revisited

The problem of estimating the number n of distinct keys of a large collection of N data is well known in computer science. A classical algorithm is the adaptive sampling (AS). n can be estimated by R2 J , where R is the final bucket size and J is the final depth at the end of the process. Several ne...

Full description

Saved in:
Bibliographic Details
Published inDiscrete Mathematics and Theoretical Computer Science Vol. 21 no. 3; no. Analysis of Algorithms; pp. COV1 - 27
Main Authors Drescher, Matthew, Louchard, Guy, Swan, Yvik
Format Journal Article Web Resource
LanguageEnglish
Published Nancy DMTCS 01.07.2019
Maison de l'informatique et des mathematiques discretes
Discrete Mathematics & Theoretical Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The problem of estimating the number n of distinct keys of a large collection of N data is well known in computer science. A classical algorithm is the adaptive sampling (AS). n can be estimated by R2 J , where R is the final bucket size and J is the final depth at the end of the process. Several new interesting questions can be asked about AS (some of them were suggested by P.Flajolet and popularized by J.Lumbroso). The distribution of W = log(R2 J /n) is known, we rederive this distribution in a simpler way. We provide new results on the moments of J and W. We also analyze the final cache size R distribution. We consider colored keys: assume also that among the n distinct keys, m do have color K We show how to estimate p = m n. We study keys with some multiplicity : we provide a way to estimate the total number M of some color K keys among the total number N of keys. We consider the case where we know a priori the multiplicities but not the colors. There we want to estimate the total number of keys N. An appendix is devoted to the case where the hashing function provides bits with probability different from 1/2.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1365-8050
1462-7264
1365-8050
DOI:10.23638/DMTCS-21-3-13