Opportunities for Shape-based Optimization of Link Traversal Queries

Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less pe...

Full description

Saved in:
Bibliographic Details
Main Authors Tam, Bryan-Elliott, Taelman, Ruben, Colpaert, Pieter, Verborgh, Ruben
Format Journal Article
LanguageEnglish
Published 01.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes.
AbstractList Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes.
Author Colpaert, Pieter
Taelman, Ruben
Verborgh, Ruben
Tam, Bryan-Elliott
Author_xml – sequence: 1
  givenname: Bryan-Elliott
  surname: Tam
  fullname: Tam, Bryan-Elliott
– sequence: 2
  givenname: Ruben
  surname: Taelman
  fullname: Taelman, Ruben
– sequence: 3
  givenname: Pieter
  surname: Colpaert
  fullname: Colpaert, Pieter
– sequence: 4
  givenname: Ruben
  surname: Verborgh
  fullname: Verborgh, Ruben
BackLink https://doi.org/10.48550/arXiv.2407.00998$$DView paper in arXiv
BookMark eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxMNczMLC0tOBkcPEvKMgvKinNyyzJTC1WSMsvUgjOSCxI1U1KLE5NUfAvKMnMzaxKLMnMz1PIT1PwyczLVggpSixLLSpOzFEILE0tAmrjYWBNS8wpTuWF0twM8m6uIc4eumD74guKMnMTiyrjQfbGg-01JqwCAHGWOnA
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2407.00998
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2407_00998
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2407_009983
IEDL.DBID GOX
IngestDate Tue Sep 03 12:15:32 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2407_009983
OpenAccessLink https://arxiv.org/abs/2407.00998
ParticipantIDs arxiv_primary_2407_00998
PublicationCentury 2000
PublicationDate 2024-07-01
PublicationDateYYYYMMDD 2024-07-01
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-07-01
  day: 01
PublicationDecade 2020
PublicationYear 2024
Score 3.8421223
SecondaryResourceType preprint
Snippet Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Databases
Title Opportunities for Shape-based Optimization of Link Traversal Queries
URI https://arxiv.org/abs/2407.00998
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEB129-RFFJX1ew5eo5Kmdfco6roIWsQVeitNOkUPrqXdij_fmaSil70mQzIkhPdePl4AzmhqGIccqYTJu5hqWzV1JlZkHPMLVtCukq2Bx6dk_moesjgbAP6-hSma7_ev4A9s2wuRG-dCYiZDGGotV7bu0ywcTnorrj7-L445pi_6BxKzLdjs2R1eh-nYhgEtd-A2rYXkdktvXorMEvHlrahJCYKUmPKi_ehfQ-JnhaIOcSG_AjUtt_XciRVxuwuns7vFzVz5fvM6mETkklLuU4r2YMRSnsaA-rIodeWMtRQZ46JC9l2sTSi-YqpQlvswXtfKwfqqQ9jQDLXhEukRjFZNR8cMlSt74sfrBzRzbh4
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Opportunities+for+Shape-based+Optimization+of+Link+Traversal+Queries&rft.au=Tam%2C+Bryan-Elliott&rft.au=Taelman%2C+Ruben&rft.au=Colpaert%2C+Pieter&rft.au=Verborgh%2C+Ruben&rft.date=2024-07-01&rft_id=info:doi/10.48550%2Farxiv.2407.00998&rft.externalDocID=2407_00998