Efficient, semantics-rich transformation and integration of large datasets

•This paper presents a new algorithm for the generation of RDF datasets.•Its scalability is guaranteed by high performance computing techniques.•The performance of the algorithm is tested in three different use cases.•Experiments show that its performance is better than the one of related tools. The...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 133; pp. 198 - 214
Main Authors	Bernabé-Díaz, José Antonio, Legaz-García, María del Carmen, García, José Manuel, Fernández-Breis, Jesualdo Tomás
Format	Journal Article
Language	English
Published	New York Elsevier Ltd 01.11.2019 Elsevier BV
Subjects	Algorithms Bioinformatics Data transformation Datasets High performance computing Interoperability Semantic web Semantics Transformations Semantic web High performance computing Data transformation
Online Access	Get full text
ISSN	0957-4174 1873-6793
DOI	10.1016/j.eswa.2019.05.010

Cover

Loading…

More Information
Summary:	•This paper presents a new algorithm for the generation of RDF datasets.•Its scalability is guaranteed by high performance computing techniques.•The performance of the algorithm is tested in three different use cases.•Experiments show that its performance is better than the one of related tools. The digital age is making more datasets available through the Internet, but their interoperability is still limited. The Semantic Web should play a fundamental role in achieving interoperable datasets. The semantic exploitation of data requires its efficient transformation into semantic formats and the integration of heterogeneous sources. Either the scalability of the existing tools for the semantic transformation of large volumes of data is limited or these tools do not provide a semantics-rich representation of the data. The goal of this work was to show how scalable semantic data transformation processes can be designed and implemented, thereby addressing the first limitation mentioned above. Here, we propose an application of high-performance computing techniques to overcome the scalability limitation. The proposed method was implemented as an upgrade of our Semantic Web Integration Tool (SWIT). Additional improvements for supporting the transformation process in SWIT are also described in this paper. We evaluated the new method by using three case studies from the areas of bioinformatics, movies and persons. The results showed a significant speed-up with respect to the original SWIT algorithm and the related tools. The lessons learnt in our work allowed us to configure semantic transformation processes efficiently.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2019.05.010