Constructing and Cleaning Identity Graphs in the LOD Cloud

In the absence of a central naming authority on the Semantic Web, it is common for different data sets to refer to the same thing by different names. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that d...

Full description

Saved in:
Bibliographic Details
Published inData intelligence Vol. 2; no. 3; pp. 323 - 352
Main Authors Raad, Joe, Beek, Wouter, van Harmelen, Frank, Wielemaker, Jan, Pernelle, Nathalie, Saïs, Fatiha
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.07.2020
MIT Press Journals, The
Subjects
Online AccessGet full text
ISSN2641-435X
2641-435X
DOI10.1162/dint_a_00057

Cover

Abstract In the absence of a central naming authority on the Semantic Web, it is common for different data sets to refer to the same thing by different names. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that date back as far as 2009, observed that the owl:sameAs property is sometimes used incorrectly. In our previous work, we presented an identity graph containing over 500 million explicit and 35 billion implied owl:sameAs statements, and presented a scalable approach for automatically calculating an error degree for each identity statement. In this paper, we generate subgraphs of the overall identity graph that correspond to certain error degrees. We show that even though the Semantic Web contains many erroneous owl:sameAs statements, it is still possible to use Semantic Web data while at the same time minimising the adverse effects of misusing owl:sameAs.
AbstractList In the absence of a central naming authority on the Semantic Web, it is common for different data sets to refer to the same thing by different names. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that date back as far as 2009, observed that the owl:sameAs property is sometimes used incorrectly. In our previous work, we presented an identity graph containing over 500 million explicit and 35 billion implied owl:sameAs statements, and presented a scalable approach for automatically calculating an error degree for each identity statement. In this paper, we generate subgraphs of the overall identity graph that correspond to certain error degrees. We show that even though the Semantic Web contains many erroneous owl:sameAs statements, it is still possible to use Semantic Web data while at the same time minimising the adverse effects of misusing owl:sameAs.
Author Raad, Joe
Pernelle, Nathalie
Wielemaker, Jan
van Harmelen, Frank
Saïs, Fatiha
Beek, Wouter
Author_xml – sequence: 1
  givenname: Joe
  surname: Raad
  fullname: Raad, Joe
  email: j.raad@vu.nl
  organization: Deptartment of Computer Science, Vrije University, Amsterdam, The Netherlands
– sequence: 2
  givenname: Wouter
  surname: Beek
  fullname: Beek, Wouter
  organization: Deptartment of Computer Science, Vrije University, Amsterdam, The Netherlands
– sequence: 3
  givenname: Frank
  surname: van Harmelen
  fullname: van Harmelen, Frank
  organization: Deptartment of Computer Science, Vrije University, Amsterdam, The Netherlands
– sequence: 4
  givenname: Jan
  surname: Wielemaker
  fullname: Wielemaker, Jan
  organization: Deptartment of Computer Science, Vrije University, Amsterdam, The Netherlands
– sequence: 5
  givenname: Nathalie
  surname: Pernelle
  fullname: Pernelle, Nathalie
– sequence: 6
  givenname: Fatiha
  surname: Saïs
  fullname: Saïs, Fatiha
BookMark eNp10UFPwyAUB3BiZuKcu_kBmnjxYBUopeDFmKpzyZJddvBGSKGOZYMK1GR-elnmYRp3ApLf__F4nIOBdVYDcIngLUIU3yljo5ACQlhWJ2CIKUE5Kcq3wcH-DIxDWCWCEUWclENwXzsbou-baOx7Jq3K6rWWdneYKm2jidts4mW3DJmxWVzqbDZ_Ssb16gKctnId9PhnHYHFy_Oifs1n88m0fpzlDcEk5oqQirWogKrUulSaVpQixgraYMQgRUQSygiETCHKFcWEo5a2StK25QzzYgSu9mU77z56HaJYud7bdKPAjENOq_S8pPBeNd6F4HUrGhNlNM5GL81aICh2UxKHU0qhmz-hzpuN9Ntj_HrPN-agiSP04R-6I5_YFKKApCyQwOkfUlhALr5M97vCN18ii6Y
CitedBy_id crossref_primary_10_1109_ACCESS_2023_3250105
crossref_primary_10_1145_3721985
Cites_doi 10.1103/PhysRevE.80.056117
10.1016/j.cnsns.2012.03.023
10.1103/PhysRevE.80.016118
10.1038/srep30750
10.1016/j.websem.2011.11.002
10.1103/PhysRevE.80.016109
10.1103/PhysRevE.69.026113
10.1016/j.phys-rep.2009.11.002
10.1103/PhysRevE.78.046110
10.1088/1742-5468/2008/10/P10008
ContentType Journal Article
Copyright 2020. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2020. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
8FE
8FG
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L6V
M7S
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
DOI 10.1162/dint_a_00057
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Engineering Collection
Engineering Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
Engineering Collection
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList CrossRef

Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2641-435X
EndPage 352
ExternalDocumentID 10_1162_dint_a_00057
dint_a_00057.pdf
GroupedDBID ALMA_UNASSIGNED_HOLDINGS
EBS
EJD
GROUPED_DOAJ
LM3
OK1
RMI
AAYXX
ABJCF
AFKRA
ARAPS
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
JMNJE
K7-
M7S
PHGZM
PHGZT
PIMPY
PTHSS
8FE
8FG
ABUWG
AZQEC
DWQXO
GNUQQ
JQ2
L6V
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
ID FETCH-LOGICAL-c424t-d4478f130d5ee5de676618836c2180614a4684008d169d62491f6fda6ff98293
IEDL.DBID 8FG
ISSN 2641-435X
IngestDate Fri Jul 25 11:39:55 EDT 2025
Sun Jul 06 05:03:59 EDT 2025
Thu Apr 24 23:01:46 EDT 2025
Tue Mar 01 17:36:41 EST 2022
Tue Mar 01 17:18:01 EST 2022
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c424t-d4478f130d5ee5de676618836c2180614a4684008d169d62491f6fda6ff98293
Notes Summer, 2020
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2890967264?pq-origsite=%requestingapplication%
PQID 2890967264
PQPubID 6535869
PageCount 30
ParticipantIDs proquest_journals_2890967264
crossref_citationtrail_10_1162_dint_a_00057
mit_journals_dintv2i3_304531_2021_11_09_zip_dint_a_00057
mit_journals_10_1162_dint_a_00057
crossref_primary_10_1162_dint_a_00057
PublicationCentury 2000
PublicationDate 2020-07-01
PublicationDateYYYYMMDD 2020-07-01
PublicationDate_xml – month: 07
  year: 2020
  text: 2020-07-01
  day: 01
PublicationDecade 2020
PublicationPlace One Rogers Street, Cambridge, MA 02142-1209, USA
PublicationPlace_xml – name: One Rogers Street, Cambridge, MA 02142-1209, USA
– name: Cambridge
PublicationTitle Data intelligence
PublicationYear 2020
Publisher MIT Press
MIT Press Journals, The
Publisher_xml – name: MIT Press
– name: MIT Press Journals, The
References ref24
ref26
ref25
ref20
ref22
ref21
ref1
ref17
ref16
ref18
References_xml – ident: ref18
  doi: 10.1103/PhysRevE.80.056117
– ident: ref17
  doi: 10.1016/j.cnsns.2012.03.023
– ident: ref21
  doi: 10.1103/PhysRevE.80.016118
– ident: ref25
  doi: 10.1038/srep30750
– ident: ref1
  doi: 10.1016/j.websem.2011.11.002
– ident: ref24
  doi: 10.1103/PhysRevE.80.016109
– ident: ref26
  doi: 10.1103/PhysRevE.69.026113
– ident: ref16
  doi: 10.1016/j.phys-rep.2009.11.002
– ident: ref20
  doi: 10.1103/PhysRevE.78.046110
– ident: ref22
  doi: 10.1088/1742-5468/2008/10/P10008
SSID ssj0002161945
Score 2.1404536
Snippet In the absence of a central naming authority on the Semantic Web, it is common for different data sets to refer to the same thing by different names. Whenever...
SourceID proquest
crossref
mit
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 323
SubjectTerms Graph theory
Identity
Linked Open Data
Names
Quality
Reasoning
Semantic web
Semantics
Title Constructing and Cleaning Identity Graphs in the LOD Cloud
URI https://direct.mit.edu/dint/article/doi/10.1162/dint_a_00057
https://www.proquest.com/docview/2890967264
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwEA5ue_FFFBWnc0TQJylrszRLfRGd-4HoFJmwt9A2qRRmN1kV9K_3rmu3qcznXB6Su-TLd3e5I-Q08EJtN41rGcAGixsNR0pEocWlH0jfDTxu42_k-4HoP_PbkTvKHW6zPK2yuBOzi1pPQvSRNzAg5okW4Pfl9M3CrlEYXc1baJRIxQGkQTuX3d7Cx8Ic5Ohuke8uWAPwIFU-RlwRj1aQqPQap3-u4wxjuttkK38c0qu5NnfIhkl2yQX21JxXeU1eKBB_2h4bH90ZNP9l-0l7WHZ6RuOEwnuO3j3cgMzkXe-RYbczbPetvOOBFXLGU0tz3pIRwIp2jXG1ES2ATymbIgQkRu7mcyzOYkvtCE8LoE5OJCLtiyjyJAD3Piknk8QcEOoxFthChwYYBw-4BlYjZCgcraOIt5hfJefF4lWYVwPHphRjlbECwdTqVlXJ2UJ6Oq-CsUbuBPZR5cdgtkZG_pDBsQ8WNxXGbJuOYqA2mKVsT33F019Ta4WClvOXhnH4__AR2WRIlrNc2xopg-bMMbwo0qCemU2dVK47g8enbwQwytU
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dT8IwEG8QHvTFaNT4gVoTfTILW-lKZ2KMAoqKaAwmvDXb2hkSHBhQg_-T_6N3sIEf0Teed31Y73q__u56d4TsB16o7aJxLQPYYHGj4UiJKLS49APpu4HHbaxGvmmI2gO_armtDPlIa2HwWWXqE0eOWndDjJEXMCHmiRLg90nv2cKpUZhdTUdojM3i2gzfgLL1jy8roN8Dxs6rzXLNSqYKWCFnfGBpzksyAtetXWNcbUQJIErKoggB7ZAf-RwboNhSO8LTAuiJE4lI-yKKPMmw9xJ4_BzHgtYsyZ1VG3f3k6AOczAo4KYP7AUrAAANlI8pXgTAL9A399Qe_PL_I1A7XyKLyW2Uno7NZ5lkTLxCjnCI57itbPxI_VjTcsf4GD-hSVnvkF5gn-s-bccULpC0flsBme6LXiXNWWzGGsnG3disE-oxFthChwYoDg-4BholZCgcraOIl5i_QQ7Tn1dh0n4cp2B01IiGCKa-btUGOZhI98ZtN_6Q24N9VMm56_8hI7_J4LdX1i4qTBIXHcVAbbBK2Z56b_d-LM2nCpqun1ri5v-fd8l8rXlTV_XLxvUWWWDI1EcPffMkC1o023CdGQQ7iRFRomZstp_PiASb
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Constructing+and+Cleaning+Identity+Graphs+in+the+LOD+Cloud&rft.jtitle=Data+intelligence&rft.au=Raad%2C+Joe&rft.au=Beek%2C+Wouter&rft.au=Frank+van+Harmelen&rft.au=Wielemaker%2C+Jan&rft.date=2020-07-01&rft.pub=MIT+Press+Journals%2C+The&rft.eissn=2641-435X&rft.volume=2&rft.issue=3&rft.spage=323&rft_id=info:doi/10.1162%2Fdint_a_00057
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2641-435X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2641-435X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2641-435X&client=summon