Extracting data models from background knowledge graphs

Knowledge Graphs have emerged as a core technology to aggregate and publish knowledge on the Web. However, integrating knowledge from different sources, not specifically designed to be interoperable, is not a trivial task. Finding the right ontologies to model a dataset is a challenge since several...

Full description

Saved in:
Bibliographic Details
Published inKnowledge-based systems Vol. 237; p. 107818
Main Authors Oliveira, Daniela, d’Aquin, Mathieu
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 15.02.2022
Elsevier Science Ltd
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Knowledge Graphs have emerged as a core technology to aggregate and publish knowledge on the Web. However, integrating knowledge from different sources, not specifically designed to be interoperable, is not a trivial task. Finding the right ontologies to model a dataset is a challenge since several valid data models exist and there is no clear agreement between them. In this paper, we propose to facilitate the selection of a data model with the RICDaM (Recommending Interoperable and Consistent Data Models) framework. RICDaM generates and ranks candidates that match entity types and properties in an input dataset. These candidates are obtained by aggregating freely available domain RDF datasets in a knowledge graph and then enriching the relationships between the graph’s entities. The entity type and object property candidates are obtained by exploiting the instances and structure of this knowledge graph to compute a score that considers both the accuracy and interoperability of the candidates. Datatype properties are predicted with a random forest model, trained on the knowledge graph properties and their values, so to make predictions on candidate properties and rank them according to different measures. We present experiments using multiple datasets from the library domain as a use case and show that our methodology can produce meaningful candidate data models, adaptable to specific scenarios and needs. •Heterogeneous data is challenging to integrate when several data models exist in the same domain.•RICDaM is a framework for integrating heterogeneous data with domain knowledge graphs.•RICDaM includes algorithms to improve precision, interoperability, and consistency.•Experiments show the effectiveness and positive impact of these algorithms.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2021.107818