The RDF2vec family of knowledge graph embedding methods An experimental evaluation of RDF2vec variants and their capabilities

Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used i...

Full description

Saved in:
Bibliographic Details
Published inSemantic Web Vol. 15; no. 3; pp. 845 - 876
Main Authors Portisch, Jan, Paulheim, Heiko
Format Journal Article
LanguageEnglish
Published 14.05.2024
Online AccessGet full text
ISSN1570-0844
2210-4968
DOI10.3233/SW-233514

Cover

Abstract Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used in various applications. Recently, multiple variants to the RDF2vec approach have been proposed, introducing variations both on the walk generation and on the language modeling side. The combination of those different approaches has lead to an increasing family of RDF2vec variants. In this paper, we evaluate a total of twelve RDF2vec variants on a comprehensive set of benchmark models, and compare them to seven existing knowledge graph embedding methods from the family of link prediction approaches. Besides the established GEval benchmark introducing various downstream machine learning tasks on the DBpedia knowledge graph, we also use the new DLCC (Description Logic Class Constructors) benchmark consisting of two gold standards, one based on DBpedia, and one based on synthetically generated graphs. The latter allows for analyzing which ontological patterns in a knowledge graph can actually be learned by different embedding. With this evaluation, we observe that certain tailored RDF2vec variants can lead to improved performance on different downstream tasks, given the nature of the underlying problem, and that they, in particular, have a different behavior in modeling similarity and relatedness. The findings can be used to provide guidance in selecting a particular RDF2vec method for a given task.
AbstractList Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used in various applications. Recently, multiple variants to the RDF2vec approach have been proposed, introducing variations both on the walk generation and on the language modeling side. The combination of those different approaches has lead to an increasing family of RDF2vec variants. In this paper, we evaluate a total of twelve RDF2vec variants on a comprehensive set of benchmark models, and compare them to seven existing knowledge graph embedding methods from the family of link prediction approaches. Besides the established GEval benchmark introducing various downstream machine learning tasks on the DBpedia knowledge graph, we also use the new DLCC (Description Logic Class Constructors) benchmark consisting of two gold standards, one based on DBpedia, and one based on synthetically generated graphs. The latter allows for analyzing which ontological patterns in a knowledge graph can actually be learned by different embedding. With this evaluation, we observe that certain tailored RDF2vec variants can lead to improved performance on different downstream tasks, given the nature of the underlying problem, and that they, in particular, have a different behavior in modeling similarity and relatedness. The findings can be used to provide guidance in selecting a particular RDF2vec method for a given task.
Author Paulheim, Heiko
Portisch, Jan
Author_xml – sequence: 1
  givenname: Jan
  surname: Portisch
  fullname: Portisch, Jan
  organization: SAP SE, Germany
– sequence: 2
  givenname: Heiko
  surname: Paulheim
  fullname: Paulheim, Heiko
  organization: Data and Web Science Group, University of Mannheim, Germany
BookMark eNqVjrsOgjAUQG-MJqIy-AddHdCW8nJWibOQMDYIl4cCJa3R8Pdi9Ac8y1nOcBYw7WSHAGtGt9zmfBcl1iiXORMwbJtRy9l7wRQM5vrUooHjzMHU-kZHXObxwDXAjyskl2NoPzEjRdrWzUBkQe6dfDWYl0hKlfYVwfaKeV53JWnxUclcr2BWpI1G8-clbMJTfDhbmZJaKyxEr-o2VYNgVHzmRJSI7xz_p30DJtc_jA
Cites_doi 10.3233/SW-180317
10.1007/978-3-319-46547-0_20
10.1609/aaai.v32i1.11573
10.48550/arXiv.2204.02777
10.1145/3066911.3066918
10.3233/SW-160218
10.1007/978-3-030-62466-8_22
10.18653/v1/W15-4007
10.7551/mitpress/7287.001.0001
10.1145/3102254.3102279
10.3233/SW-222991
10.1007/978-3-319-58068-5_9
10.1007/978-3-642-41335-3_32
10.1007/s10994-020-05890-8
10.1145/1376616.1376746
10.1007/s00799-020-00288-2
10.1145/503104.503110
10.1007/978-3-030-32327-1_31
10.1145/3397271.3401172
10.1162/coli.2006.32.1.13
10.1007/978-3-030-77385-4_37
10.1007/978-3-031-30387-6
10.1145/2939672.2939754
10.1109/TKDE.2017.2754499
10.1145/3442442.3451382
10.1007/978-3-030-62327-2_35
10.3233/SW-212892
10.1016/j.is.2020.101624
10.1162/COLI_a_00237
10.1007/978-3-030-49461-2_33
10.1137/20M1386062
10.1109/TKDE.2018.2807452
10.1007/978-3-031-19433-7_34
10.1038/sdata.2016.18
10.1109/ACCESS.2019.2894679
10.3233/SSW200009
10.1609/aaai.v29i1.9491
10.1162/tacl_a_00051
10.1007/978-3-319-46523-4_30
10.1145/2063518.2063519
10.1145/2254129.2254168
10.1023/A:1009752403260
10.1007/978-3-642-30284-8_44
10.3390/electronics9050750
10.1145/2396761.2396832
10.1145/2623330.2623732
10.1145/3459637.3482377
10.1016/j.websem.2005.06.005
10.1609/aaai.v32i1.11535
10.3115/v1/n15-1142
10.1007/978-3-030-88361-4_17
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.3233/SW-233514
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2210-4968
EndPage 876
ExternalDocumentID 10_3233_SW_233514
GroupedDBID 0R~
4.4
AAFWJ
AAYXX
ABJNI
ACGFS
ACPQW
ADMLS
ADZMO
AENEX
AJNRN
ALMA_UNASSIGNED_HOLDINGS
ARCSS
ASPBG
AVWKF
CITATION
EBS
H13
HZ~
IOS
MET
MIO
MV1
NGNOM
O9-
OK1
PQQKQ
TUS
ID FETCH-crossref_primary_10_3233_SW_2335143
ISSN 1570-0844
IngestDate Wed Aug 27 16:38:03 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-crossref_primary_10_3233_SW_2335143
ParticipantIDs crossref_primary_10_3233_SW_233514
PublicationCentury 2000
PublicationDate 2024-05-14
PublicationDateYYYYMMDD 2024-05-14
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-14
  day: 14
PublicationDecade 2020
PublicationTitle Semantic Web
PublicationYear 2024
References Lavrač (10.3233/SW-233514_ref24) 2020; 109
Wilkinson (10.3233/SW-233514_ref68) 2016; 3
10.3233/SW-233514_ref45
10.3233/SW-233514_ref42
10.3233/SW-233514_ref43
10.3233/SW-233514_ref48
10.3233/SW-233514_ref49
10.3233/SW-233514_ref46
10.3233/SW-233514_ref47
Färber (10.3233/SW-233514_ref14) 2020; 21
10.3233/SW-233514_ref51
10.3233/SW-233514_ref52
10.3233/SW-233514_ref50
Guo (10.3233/SW-233514_ref18) 2005; 3
Cai (10.3233/SW-233514_ref8) 2018; 30
10.3233/SW-233514_ref19
Portisch (10.3233/SW-233514_ref44) 2022; 13
10.3233/SW-233514_ref17
10.3233/SW-233514_ref11
10.3233/SW-233514_ref56
10.3233/SW-233514_ref54
10.3233/SW-233514_ref15
10.3233/SW-233514_ref59
10.3233/SW-233514_ref13
Xu (10.3233/SW-233514_ref69) 2021; 63
10.3233/SW-233514_ref62
Dai (10.3233/SW-233514_ref10) 2020; 9
10.3233/SW-233514_ref63
10.3233/SW-233514_ref60
10.3233/SW-233514_ref61
Finkelstein (10.3233/SW-233514_ref16) 2002; 20
Wang (10.3233/SW-233514_ref66) 2017; 29
Ristoski (10.3233/SW-233514_ref57) 2019; 10
10.3233/SW-233514_ref28
10.3233/SW-233514_ref29
Hill (10.3233/SW-233514_ref20) 2015; 41
10.3233/SW-233514_ref22
10.3233/SW-233514_ref23
10.3233/SW-233514_ref67
10.3233/SW-233514_ref64
10.3233/SW-233514_ref21
10.3233/SW-233514_ref65
10.3233/SW-233514_ref26
10.3233/SW-233514_ref27
10.3233/SW-233514_ref25
10.3233/SW-233514_ref70
10.3233/SW-233514_ref30
10.3233/SW-233514_ref6
10.3233/SW-233514_ref9
Bojanowski (10.3233/SW-233514_ref4) 2017; 5
10.3233/SW-233514_ref3
10.3233/SW-233514_ref2
10.3233/SW-233514_ref5
10.3233/SW-233514_ref39
Budanitsky (10.3233/SW-233514_ref7) 2006; 32
10.3233/SW-233514_ref33
10.3233/SW-233514_ref34
10.3233/SW-233514_ref31
10.3233/SW-233514_ref32
10.3233/SW-233514_ref38
10.3233/SW-233514_ref35
10.3233/SW-233514_ref36
Salzberg (10.3233/SW-233514_ref58) 1997; 1
10.3233/SW-233514_ref40
10.3233/SW-233514_ref41
Raza (10.3233/SW-233514_ref53) 2019; 7
Paulheim (10.3233/SW-233514_ref37) 2017; 8
References_xml – ident: 10.3233/SW-233514_ref28
– volume: 10
  start-page: 721
  issue: 4
  year: 2019
  ident: 10.3233/SW-233514_ref57
  article-title: RDF2Vec: RDF graph embeddings and their applications
  publication-title: Semantic Web
  doi: 10.3233/SW-180317
– ident: 10.3233/SW-233514_ref54
  doi: 10.1007/978-3-319-46547-0_20
– ident: 10.3233/SW-233514_ref11
  doi: 10.1609/aaai.v32i1.11573
– ident: 10.3233/SW-233514_ref34
– ident: 10.3233/SW-233514_ref51
  doi: 10.48550/arXiv.2204.02777
– ident: 10.3233/SW-233514_ref23
  doi: 10.1145/3066911.3066918
– volume: 8
  start-page: 489
  issue: 3
  year: 2017
  ident: 10.3233/SW-233514_ref37
  article-title: Knowledge graph refinement: A survey of approaches and evaluation methods
  publication-title: Semantic web
  doi: 10.3233/SW-160218
– ident: 10.3233/SW-233514_ref61
  doi: 10.1007/978-3-030-62466-8_22
– ident: 10.3233/SW-233514_ref62
– ident: 10.3233/SW-233514_ref64
  doi: 10.18653/v1/W15-4007
– ident: 10.3233/SW-233514_ref15
  doi: 10.7551/mitpress/7287.001.0001
– ident: 10.3233/SW-233514_ref9
  doi: 10.1145/3102254.3102279
– ident: 10.3233/SW-233514_ref22
  doi: 10.3233/SW-222991
– ident: 10.3233/SW-233514_ref29
  doi: 10.1007/978-3-319-58068-5_9
– ident: 10.3233/SW-233514_ref38
  doi: 10.1007/978-3-642-41335-3_32
– volume: 109
  start-page: 1465
  year: 2020
  ident: 10.3233/SW-233514_ref24
  article-title: Propositionalization and embeddings: Two sides of the same coin
  publication-title: Machine Learning
  doi: 10.1007/s10994-020-05890-8
– ident: 10.3233/SW-233514_ref5
  doi: 10.1145/1376616.1376746
– volume: 21
  start-page: 375
  issue: 4
  year: 2020
  ident: 10.3233/SW-233514_ref14
  article-title: Citation recommendation: Approaches and datasets
  publication-title: International Journal on Digital Libraries
  doi: 10.1007/s00799-020-00288-2
– ident: 10.3233/SW-233514_ref2
– ident: 10.3233/SW-233514_ref6
– volume: 20
  start-page: 116
  issue: 1
  year: 2002
  ident: 10.3233/SW-233514_ref16
  article-title: Placing search in context: The concept revisited
  publication-title: ACM Trans. Inf. Syst.
  doi: 10.1145/503104.503110
– ident: 10.3233/SW-233514_ref42
  doi: 10.1007/978-3-030-32327-1_31
– ident: 10.3233/SW-233514_ref70
  doi: 10.1145/3397271.3401172
– volume: 32
  start-page: 13
  issue: 1
  year: 2006
  ident: 10.3233/SW-233514_ref7
  article-title: Evaluating WordNet-based measures of lexical semantic relatedness
  publication-title: Comput. Linguistics
  doi: 10.1162/coli.2006.32.1.13
– ident: 10.3233/SW-233514_ref3
  doi: 10.1007/978-3-030-77385-4_37
– ident: 10.3233/SW-233514_ref40
  doi: 10.1007/978-3-031-30387-6
– ident: 10.3233/SW-233514_ref17
  doi: 10.1145/2939672.2939754
– volume: 29
  start-page: 2724
  issue: 12
  year: 2017
  ident: 10.3233/SW-233514_ref66
  article-title: Knowledge graph embedding: A survey of approaches and applications
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2017.2754499
– ident: 10.3233/SW-233514_ref48
  doi: 10.1145/3442442.3451382
– ident: 10.3233/SW-233514_ref31
– ident: 10.3233/SW-233514_ref60
  doi: 10.1007/978-3-030-62327-2_35
– ident: 10.3233/SW-233514_ref63
– ident: 10.3233/SW-233514_ref46
– ident: 10.3233/SW-233514_ref25
– volume: 13
  start-page: 399
  issue: 3
  year: 2022
  ident: 10.3233/SW-233514_ref44
  article-title: Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?
  publication-title: Semantic Web
  doi: 10.3233/SW-212892
– ident: 10.3233/SW-233514_ref35
  doi: 10.1016/j.is.2020.101624
– ident: 10.3233/SW-233514_ref49
– volume: 41
  start-page: 665
  issue: 4
  year: 2015
  ident: 10.3233/SW-233514_ref20
  article-title: SimLex-999: Evaluating semantic models with (genuine) similarity estimation
  publication-title: Comput. Linguistics
  doi: 10.1162/COLI_a_00237
– ident: 10.3233/SW-233514_ref41
  doi: 10.1007/978-3-030-49461-2_33
– volume: 63
  start-page: 825
  issue: 4
  year: 2021
  ident: 10.3233/SW-233514_ref69
  article-title: Understanding graph embedding methods and their applications
  publication-title: SIAM Rev.
  doi: 10.1137/20M1386062
– volume: 30
  start-page: 1616
  issue: 9
  year: 2018
  ident: 10.3233/SW-233514_ref8
  article-title: A comprehensive survey of graph embedding: Problems, techniques, and applications
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2018.2807452
– ident: 10.3233/SW-233514_ref13
– ident: 10.3233/SW-233514_ref32
– ident: 10.3233/SW-233514_ref52
  doi: 10.1007/978-3-031-19433-7_34
– volume: 3
  start-page: 1
  issue: 1
  year: 2016
  ident: 10.3233/SW-233514_ref68
  article-title: The FAIR guiding principles for scientific data management and stewardship
  publication-title: Scientific data
  doi: 10.1038/sdata.2016.18
– volume: 7
  start-page: 17823
  year: 2019
  ident: 10.3233/SW-233514_ref53
  article-title: A taxonomy and survey of semantic approaches for query expansion
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2894679
– ident: 10.3233/SW-233514_ref19
  doi: 10.3233/SSW200009
– ident: 10.3233/SW-233514_ref45
– ident: 10.3233/SW-233514_ref26
  doi: 10.1609/aaai.v29i1.9491
– volume: 5
  start-page: 135
  year: 2017
  ident: 10.3233/SW-233514_ref4
  article-title: Enriching word vectors with subword information
  publication-title: Transactions of the association for computational linguistics
  doi: 10.1162/tacl_a_00051
– ident: 10.3233/SW-233514_ref56
  doi: 10.1007/978-3-319-46523-4_30
– ident: 10.3233/SW-233514_ref30
  doi: 10.1145/2063518.2063519
– ident: 10.3233/SW-233514_ref39
  doi: 10.1145/2254129.2254168
– volume: 1
  start-page: 317
  issue: 3
  year: 1997
  ident: 10.3233/SW-233514_ref58
  article-title: On comparing classifiers: Pitfalls to avoid and a recommended approach
  publication-title: Data Min. Knowl. Discov.
  doi: 10.1023/A:1009752403260
– ident: 10.3233/SW-233514_ref36
  doi: 10.1007/978-3-642-30284-8_44
– volume: 9
  start-page: 750
  issue: 5
  year: 2020
  ident: 10.3233/SW-233514_ref10
  article-title: A survey on knowledge graph embedding: Approaches, applications and benchmarks
  publication-title: Electronics
  doi: 10.3390/electronics9050750
– ident: 10.3233/SW-233514_ref33
– ident: 10.3233/SW-233514_ref21
  doi: 10.1145/2396761.2396832
– ident: 10.3233/SW-233514_ref50
– ident: 10.3233/SW-233514_ref43
  doi: 10.1145/2623330.2623732
– ident: 10.3233/SW-233514_ref65
– ident: 10.3233/SW-233514_ref67
  doi: 10.1145/3459637.3482377
– volume: 3
  start-page: 158
  issue: 2–3
  year: 2005
  ident: 10.3233/SW-233514_ref18
  article-title: LUBM: A benchmark for OWL knowledge base systems
  publication-title: Journal of Web Semantics
  doi: 10.1016/j.websem.2005.06.005
– ident: 10.3233/SW-233514_ref59
  doi: 10.1609/aaai.v32i1.11535
– ident: 10.3233/SW-233514_ref27
  doi: 10.3115/v1/n15-1142
– ident: 10.3233/SW-233514_ref47
  doi: 10.1007/978-3-030-88361-4_17
SSID ssj0000516385
Score 4.629233
Snippet Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector...
SourceID crossref
SourceType Index Database
StartPage 845
Subtitle An experimental evaluation of RDF2vec variants and their capabilities
Title The RDF2vec family of knowledge graph embedding methods
Volume 15
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA-6veiD-InfBPFFRlTatHGPwznGUB_sZHsrS3rDMbaKTh_8670kXRZ14PSlhKNJmvzSu8sld0fIqQItuLlksQyBocTrMSm4Yqg7xzpHJYieuSB7HzcfeasbdWcp_Yx3yUSeq4-5fiX_QRVpiKv2kv0Dsq5RJGAZ8cUnIozPhTF-qDeCd1BTSwXqfs5MVjHRqCswkpAZ3xWbLvrVV0gTGOHcDlSlA9KxyVynaLY5olqz1aMvET6Bzb7chMEw9y0GAdeH3dZTc8rkhInnaklgaAFu_Riv2hQ3jjNG3goIPTZ3ZUNAOokZz2PGYaCNxY2kw7AQFR_wJeD1N0HkrgfixkRXTpNOaqsuk3IghD6GL9fqd7eJs6IhS0EGEpmguMWgbPgoXf_Cde0pHZ720F4na4XaT2sWww2yBONNsuoFg9wiAtGkBZrUoknzPnVoUoMmdWjSAs1tcta4aV832bTv9NlGEEl_DDDcIaVxPoZdQvu4R4yR9_OsmnGlZI8rEFGVZ5cqDPC_3yMnv7e3v8hLB2RltjwOSWny8gZHqGJN5HExz59XLygV
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+RDF2vec+family+of+knowledge+graph+embedding+methods&rft.jtitle=Semantic+Web&rft.au=Portisch%2C+Jan&rft.au=Paulheim%2C+Heiko&rft.date=2024-05-14&rft.issn=1570-0844&rft.eissn=2210-4968&rft.volume=15&rft.issue=3&rft.spage=845&rft.epage=876&rft_id=info:doi/10.3233%2FSW-233514&rft.externalDBID=n%2Fa&rft.externalDocID=10_3233_SW_233514
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1570-0844&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1570-0844&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1570-0844&client=summon