Interval Estimation for Aggregate Queries on Incomplete Data

Incomplete data has been a longstanding issue in the database community, and the subject is yet poorly handled by both theories and practices. One common way to cope with missing values is to complete their imputation (filling in) as a preprocessing step before analyses. Unfortunately, not a single...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 34; no. 6; pp. 1203 - 1216
Main Authors Zhang, An-Zhen, Li, Jian-Zhong, Gao, Hong
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2019
Springer
Springer Nature B.V
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Subjects
Online AccessGet full text
ISSN1000-9000
1860-4749
DOI10.1007/s11390-019-1970-4

Cover

More Information
Summary:Incomplete data has been a longstanding issue in the database community, and the subject is yet poorly handled by both theories and practices. One common way to cope with missing values is to complete their imputation (filling in) as a preprocessing step before analyses. Unfortunately, not a single imputation method could impute all missing values correctly in all cases. Users could hardly trust the query result on such complete data without any confidence guarantee. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than to impute the missing values. An interval estimation, composed of the upper and the lower bound of aggregate query results among all possible interpretations of missing values, is presented to the end users. The ground-truth aggregate result is guaranteed to be among the interval. We believe that decision support applications could benefit significantly from the estimation, since they can tolerate inexact answers, as long as there are clearly defined semantics and guarantees associated with the results. Our main techniques are parameter-free and do not assume prior knowledge about the distribution and missingness mechanisms. Experimental results are consistent with the theoretical results and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-019-1970-4