Automatic Discovery of High-Level Provenance Using Semantic Similarity

As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that...

Full description

Saved in:

Bibliographic Details
Published in	Provenance and Annotation of Data and Processes pp. 97 - 110
Main Authors	De Nies, Tom, Coppens, Sam, Van Deursen, Davy, Mannens, Erik, Van de Walle, Rik
Format	Book Chapter
Language	English
Published	Berlin, Heidelberg Springer Berlin Heidelberg
Series	Lecture Notes in Computer Science
Subjects	Data Model Linked Data News Provenance Semantic Web Similarity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work.
ISBN:	3642342213 9783642342219
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-642-34222-6_8