ESTIMATING NUMBER OF DISTINCT VALUES IN A DATA SET USING MACHINE LEARNING

Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample....

Full description

Saved in:
Bibliographic Details
Main Authors Karnagel, Tomas, Agarwal, Nipun, Tauheed, Farhan, Kocberber, Onur
Format Patent
LanguageEnglish
Published 25.11.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.
Bibliography:Application Number: US202016877882