Semantic Optimization of Conjunctive Queries

This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the ACM Vol. 67; no. 6; pp. 1 - 60
Main Authors	Barceló, Pablo, Figueira, Diego, Gottlob, Georg, Pieris, Andreas
Format	Journal Article
Language	English
Published	New York Association for Computing Machinery 01.11.2020
Subjects	Complexity Computer Science Containment Discrete Mathematics Equivalence Formal Languages and Automata Theory Logic in Computer Science Optimization Optimization techniques Queries Questions Semantics
Online Access	Get full text
ISSN	0004-5411 1557-735X
DOI	10.1145/3424908

Cover

More Information
Summary:	This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≥ 1; the associated fragment is denoted GHW k . A CQ is semantically in GHW k if it is equivalent to a CQ in GHW k . The problem of checking whether a CQ is semantically in GHW k has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHW k under the constraints, while not being semantically in GHW k without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHW k , for a fixed k ≥ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHW k over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW 1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHW k alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHW k , we discuss how it can be approximated via a CQ that falls in GHW k in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0004-5411 1557-735X
DOI:	10.1145/3424908