Record Completeness Evaluation Based on Multiple Data Sources

Completeness is one of the central criteria for data quality. Data completeness means the completeness of the data relative to the description of the objective world, which divided into the completeness of the values and tuples. This paper examines how to use multiple data sources to evaluate the re...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE International Conference on Power Data Science (ICPDS) pp. 109 - 112
Main Authors	Wu, Aman, Li, LingLi, Xuan, Ping
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2019
Subjects	Computer science Conferences Data Completeness Data integrity Data quality Data record completeness evaluation Data science Mathematical model Pattern matching Random algorithm Signature Urban areas
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Completeness is one of the central criteria for data quality. Data completeness means the completeness of the data relative to the description of the objective world, which divided into the completeness of the values and tuples. This paper examines how to use multiple data sources to evaluate the record completeness of target data. However, if we want getting an accurate record completeness evaluation, we need to access all the data sources. But this will bring huge costs and is unrealistic. Therefore, this paper presents a signature-based randomized estimator for record completeness evaluation. The time to estimate record completeness is independent on the size of each data source. The basic idea of the random algorithm is to quickly estimate the record sets involved in the data sources and the target data set by linearly signing the signature for all data sources. The estimated time required is independent of the size of each data set, avoiding the huge overhead of the record pair matching. Experiments results on real data demonstrate the effectiveness and efficiency of the algorithm.
DOI:	10.1109/ICPDS47662.2019.9017199