Synthesizing Linked Data Under Cardinality and Integrity Constraints
The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating the links between relations, subject to cardinality constraints (CCs) and integrity constraints (ICs) is an important aspect of this problem. Given instances...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
26.03.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The generation of synthetic data is useful in multiple aspects, from testing
applications to benchmarking to privacy preservation. Generating the links
between relations, subject to cardinality constraints (CCs) and integrity
constraints (ICs) is an important aspect of this problem. Given instances of
two relations, where one has a foreign key dependence on the other and is
missing its foreign key ($FK$) values, and two types of constraints: (1) CCs
that apply to the join view and (2) ICs that apply to the table with missing
$FK$ values, our goal is to impute the missing $FK$ values such that the
constraints are satisfied. We provide a novel framework for the problem based
on declarative CCs and ICs. We further show that the problem is NP-hard and
propose a novel two-phase solution that guarantees the satisfaction of the ICs.
Phase I yields an intermediate solution accounting for the CCs alone, and
relies on a hybrid approach based on CC types. For one type, the problem is
modeled as an Integer Linear Program. For the others, we describe an efficient
and accurate solution. We then combine the two solutions. Phase II augments
this solution by incorporating the ICs and uses a coloring of the conflict
hypergraph to infer the values of the $FK$ column. Our extensive experimental
study shows that our solution scales well when the data and number of
constraints increases. We further show that our solution maintains low error
rates for the CCs. |
---|---|
DOI: | 10.48550/arxiv.2103.14435 |