The RDF2vec family of knowledge graph embedding methods An experimental evaluation of RDF2vec variants and their capabilities

Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used i...

Full description

Saved in:

Bibliographic Details
Published in	Semantic Web Vol. 15; no. 3; pp. 845 - 876
Main Authors	Portisch, Jan, Paulheim, Heiko
Format	Journal Article
Language	English
Published	14.05.2024
Online Access	Get full text
ISSN	1570-0844 2210-4968
DOI	10.3233/SW-233514

Cover

Abstract	Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used in various applications. Recently, multiple variants to the RDF2vec approach have been proposed, introducing variations both on the walk generation and on the language modeling side. The combination of those different approaches has lead to an increasing family of RDF2vec variants. In this paper, we evaluate a total of twelve RDF2vec variants on a comprehensive set of benchmark models, and compare them to seven existing knowledge graph embedding methods from the family of link prediction approaches. Besides the established GEval benchmark introducing various downstream machine learning tasks on the DBpedia knowledge graph, we also use the new DLCC (Description Logic Class Constructors) benchmark consisting of two gold standards, one based on DBpedia, and one based on synthetically generated graphs. The latter allows for analyzing which ontological patterns in a knowledge graph can actually be learned by different embedding. With this evaluation, we observe that certain tailored RDF2vec variants can lead to improved performance on different downstream tasks, given the nature of the underlying problem, and that they, in particular, have a different behavior in modeling similarity and relatedness. The findings can be used to provide guidance in selecting a particular RDF2vec method for a given task.
AbstractList	Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used in various applications. Recently, multiple variants to the RDF2vec approach have been proposed, introducing variations both on the walk generation and on the language modeling side. The combination of those different approaches has lead to an increasing family of RDF2vec variants. In this paper, we evaluate a total of twelve RDF2vec variants on a comprehensive set of benchmark models, and compare them to seven existing knowledge graph embedding methods from the family of link prediction approaches. Besides the established GEval benchmark introducing various downstream machine learning tasks on the DBpedia knowledge graph, we also use the new DLCC (Description Logic Class Constructors) benchmark consisting of two gold standards, one based on DBpedia, and one based on synthetically generated graphs. The latter allows for analyzing which ontological patterns in a knowledge graph can actually be learned by different embedding. With this evaluation, we observe that certain tailored RDF2vec variants can lead to improved performance on different downstream tasks, given the nature of the underlying problem, and that they, in particular, have a different behavior in modeling similarity and relatedness. The findings can be used to provide guidance in selecting a particular RDF2vec method for a given task.
Author	Paulheim, Heiko Portisch, Jan
Author_xml	– sequence: 1 givenname: Jan surname: Portisch fullname: Portisch, Jan organization: SAP SE, Germany – sequence: 2 givenname: Heiko surname: Paulheim fullname: Paulheim, Heiko organization: Data and Web Science Group, University of Mannheim, Germany
BookMark	eNqVjrsOgjAUQG-MJqIy-AddHdCW8nJWibOQMDYIl4cCJa3R8Pdi9Ac8y1nOcBYw7WSHAGtGt9zmfBcl1iiXORMwbJtRy9l7wRQM5vrUooHjzMHU-kZHXObxwDXAjyskl2NoPzEjRdrWzUBkQe6dfDWYl0hKlfYVwfaKeV53JWnxUclcr2BWpI1G8-clbMJTfDhbmZJaKyxEr-o2VYNgVHzmRJSI7xz_p30DJtc_jA
Cites_doi	10.3233/SW-180317 10.1007/978-3-319-46547-0_20 10.1609/aaai.v32i1.11573 10.48550/arXiv.2204.02777 10.1145/3066911.3066918 10.3233/SW-160218 10.1007/978-3-030-62466-8_22 10.18653/v1/W15-4007 10.7551/mitpress/7287.001.0001 10.1145/3102254.3102279 10.3233/SW-222991 10.1007/978-3-319-58068-5_9 10.1007/978-3-642-41335-3_32 10.1007/s10994-020-05890-8 10.1145/1376616.1376746 10.1007/s00799-020-00288-2 10.1145/503104.503110 10.1007/978-3-030-32327-1_31 10.1145/3397271.3401172 10.1162/coli.2006.32.1.13 10.1007/978-3-030-77385-4_37 10.1007/978-3-031-30387-6 10.1145/2939672.2939754 10.1109/TKDE.2017.2754499 10.1145/3442442.3451382 10.1007/978-3-030-62327-2_35 10.3233/SW-212892 10.1016/j.is.2020.101624 10.1162/COLI_a_00237 10.1007/978-3-030-49461-2_33 10.1137/20M1386062 10.1109/TKDE.2018.2807452 10.1007/978-3-031-19433-7_34 10.1038/sdata.2016.18 10.1109/ACCESS.2019.2894679 10.3233/SSW200009 10.1609/aaai.v29i1.9491 10.1162/tacl_a_00051 10.1007/978-3-319-46523-4_30 10.1145/2063518.2063519 10.1145/2254129.2254168 10.1023/A:1009752403260 10.1007/978-3-642-30284-8_44 10.3390/electronics9050750 10.1145/2396761.2396832 10.1145/2623330.2623732 10.1145/3459637.3482377 10.1016/j.websem.2005.06.005 10.1609/aaai.v32i1.11535 10.3115/v1/n15-1142 10.1007/978-3-030-88361-4_17
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.3233/SW-233514
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2210-4968
EndPage	876
ExternalDocumentID	10_3233_SW_233514
GroupedDBID	0R~ 4.4 AAFWJ AAYXX ABJNI ACGFS ACPQW ADMLS ADZMO AENEX AJNRN ALMA_UNASSIGNED_HOLDINGS ARCSS ASPBG AVWKF CITATION EBS H13 HZ~ IOS MET MIO MV1 NGNOM O9- OK1 PQQKQ TUS
ID	FETCH-crossref_primary_10_3233_SW_2335143
ISSN	1570-0844
IngestDate	Wed Aug 27 16:38:03 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-crossref_primary_10_3233_SW_2335143
ParticipantIDs	crossref_primary_10_3233_SW_233514
PublicationCentury	2000
PublicationDate	2024-05-14
PublicationDateYYYYMMDD	2024-05-14
PublicationDate_xml	– month: 05 year: 2024 text: 2024-05-14 day: 14
PublicationDecade	2020
PublicationTitle	Semantic Web
PublicationYear	2024
References	Lavrač (10.3233/SW-233514_ref24) 2020; 109 Wilkinson (10.3233/SW-233514_ref68) 2016; 3 10.3233/SW-233514_ref45 10.3233/SW-233514_ref42 10.3233/SW-233514_ref43 10.3233/SW-233514_ref48 10.3233/SW-233514_ref49 10.3233/SW-233514_ref46 10.3233/SW-233514_ref47 Färber (10.3233/SW-233514_ref14) 2020; 21 10.3233/SW-233514_ref51 10.3233/SW-233514_ref52 10.3233/SW-233514_ref50 Guo (10.3233/SW-233514_ref18) 2005; 3 Cai (10.3233/SW-233514_ref8) 2018; 30 10.3233/SW-233514_ref19 Portisch (10.3233/SW-233514_ref44) 2022; 13 10.3233/SW-233514_ref17 10.3233/SW-233514_ref11 10.3233/SW-233514_ref56 10.3233/SW-233514_ref54 10.3233/SW-233514_ref15 10.3233/SW-233514_ref59 10.3233/SW-233514_ref13 Xu (10.3233/SW-233514_ref69) 2021; 63 10.3233/SW-233514_ref62 Dai (10.3233/SW-233514_ref10) 2020; 9 10.3233/SW-233514_ref63 10.3233/SW-233514_ref60 10.3233/SW-233514_ref61 Finkelstein (10.3233/SW-233514_ref16) 2002; 20 Wang (10.3233/SW-233514_ref66) 2017; 29 Ristoski (10.3233/SW-233514_ref57) 2019; 10 10.3233/SW-233514_ref28 10.3233/SW-233514_ref29 Hill (10.3233/SW-233514_ref20) 2015; 41 10.3233/SW-233514_ref22 10.3233/SW-233514_ref23 10.3233/SW-233514_ref67 10.3233/SW-233514_ref64 10.3233/SW-233514_ref21 10.3233/SW-233514_ref65 10.3233/SW-233514_ref26 10.3233/SW-233514_ref27 10.3233/SW-233514_ref25 10.3233/SW-233514_ref70 10.3233/SW-233514_ref30 10.3233/SW-233514_ref6 10.3233/SW-233514_ref9 Bojanowski (10.3233/SW-233514_ref4) 2017; 5 10.3233/SW-233514_ref3 10.3233/SW-233514_ref2 10.3233/SW-233514_ref5 10.3233/SW-233514_ref39 Budanitsky (10.3233/SW-233514_ref7) 2006; 32 10.3233/SW-233514_ref33 10.3233/SW-233514_ref34 10.3233/SW-233514_ref31 10.3233/SW-233514_ref32 10.3233/SW-233514_ref38 10.3233/SW-233514_ref35 10.3233/SW-233514_ref36 Salzberg (10.3233/SW-233514_ref58) 1997; 1 10.3233/SW-233514_ref40 10.3233/SW-233514_ref41 Raza (10.3233/SW-233514_ref53) 2019; 7 Paulheim (10.3233/SW-233514_ref37) 2017; 8
References_xml	– ident: 10.3233/SW-233514_ref28 – volume: 10 start-page: 721 issue: 4 year: 2019 ident: 10.3233/SW-233514_ref57 article-title: RDF2Vec: RDF graph embeddings and their applications publication-title: Semantic Web doi: 10.3233/SW-180317 – ident: 10.3233/SW-233514_ref54 doi: 10.1007/978-3-319-46547-0_20 – ident: 10.3233/SW-233514_ref11 doi: 10.1609/aaai.v32i1.11573 – ident: 10.3233/SW-233514_ref34 – ident: 10.3233/SW-233514_ref51 doi: 10.48550/arXiv.2204.02777 – ident: 10.3233/SW-233514_ref23 doi: 10.1145/3066911.3066918 – volume: 8 start-page: 489 issue: 3 year: 2017 ident: 10.3233/SW-233514_ref37 article-title: Knowledge graph refinement: A survey of approaches and evaluation methods publication-title: Semantic web doi: 10.3233/SW-160218 – ident: 10.3233/SW-233514_ref61 doi: 10.1007/978-3-030-62466-8_22 – ident: 10.3233/SW-233514_ref62 – ident: 10.3233/SW-233514_ref64 doi: 10.18653/v1/W15-4007 – ident: 10.3233/SW-233514_ref15 doi: 10.7551/mitpress/7287.001.0001 – ident: 10.3233/SW-233514_ref9 doi: 10.1145/3102254.3102279 – ident: 10.3233/SW-233514_ref22 doi: 10.3233/SW-222991 – ident: 10.3233/SW-233514_ref29 doi: 10.1007/978-3-319-58068-5_9 – ident: 10.3233/SW-233514_ref38 doi: 10.1007/978-3-642-41335-3_32 – volume: 109 start-page: 1465 year: 2020 ident: 10.3233/SW-233514_ref24 article-title: Propositionalization and embeddings: Two sides of the same coin publication-title: Machine Learning doi: 10.1007/s10994-020-05890-8 – ident: 10.3233/SW-233514_ref5 doi: 10.1145/1376616.1376746 – volume: 21 start-page: 375 issue: 4 year: 2020 ident: 10.3233/SW-233514_ref14 article-title: Citation recommendation: Approaches and datasets publication-title: International Journal on Digital Libraries doi: 10.1007/s00799-020-00288-2 – ident: 10.3233/SW-233514_ref2 – ident: 10.3233/SW-233514_ref6 – volume: 20 start-page: 116 issue: 1 year: 2002 ident: 10.3233/SW-233514_ref16 article-title: Placing search in context: The concept revisited publication-title: ACM Trans. Inf. Syst. doi: 10.1145/503104.503110 – ident: 10.3233/SW-233514_ref42 doi: 10.1007/978-3-030-32327-1_31 – ident: 10.3233/SW-233514_ref70 doi: 10.1145/3397271.3401172 – volume: 32 start-page: 13 issue: 1 year: 2006 ident: 10.3233/SW-233514_ref7 article-title: Evaluating WordNet-based measures of lexical semantic relatedness publication-title: Comput. Linguistics doi: 10.1162/coli.2006.32.1.13 – ident: 10.3233/SW-233514_ref3 doi: 10.1007/978-3-030-77385-4_37 – ident: 10.3233/SW-233514_ref40 doi: 10.1007/978-3-031-30387-6 – ident: 10.3233/SW-233514_ref17 doi: 10.1145/2939672.2939754 – volume: 29 start-page: 2724 issue: 12 year: 2017 ident: 10.3233/SW-233514_ref66 article-title: Knowledge graph embedding: A survey of approaches and applications publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2017.2754499 – ident: 10.3233/SW-233514_ref48 doi: 10.1145/3442442.3451382 – ident: 10.3233/SW-233514_ref31 – ident: 10.3233/SW-233514_ref60 doi: 10.1007/978-3-030-62327-2_35 – ident: 10.3233/SW-233514_ref63 – ident: 10.3233/SW-233514_ref46 – ident: 10.3233/SW-233514_ref25 – volume: 13 start-page: 399 issue: 3 year: 2022 ident: 10.3233/SW-233514_ref44 article-title: Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin? publication-title: Semantic Web doi: 10.3233/SW-212892 – ident: 10.3233/SW-233514_ref35 doi: 10.1016/j.is.2020.101624 – ident: 10.3233/SW-233514_ref49 – volume: 41 start-page: 665 issue: 4 year: 2015 ident: 10.3233/SW-233514_ref20 article-title: SimLex-999: Evaluating semantic models with (genuine) similarity estimation publication-title: Comput. Linguistics doi: 10.1162/COLI_a_00237 – ident: 10.3233/SW-233514_ref41 doi: 10.1007/978-3-030-49461-2_33 – volume: 63 start-page: 825 issue: 4 year: 2021 ident: 10.3233/SW-233514_ref69 article-title: Understanding graph embedding methods and their applications publication-title: SIAM Rev. doi: 10.1137/20M1386062 – volume: 30 start-page: 1616 issue: 9 year: 2018 ident: 10.3233/SW-233514_ref8 article-title: A comprehensive survey of graph embedding: Problems, techniques, and applications publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2018.2807452 – ident: 10.3233/SW-233514_ref13 – ident: 10.3233/SW-233514_ref32 – ident: 10.3233/SW-233514_ref52 doi: 10.1007/978-3-031-19433-7_34 – volume: 3 start-page: 1 issue: 1 year: 2016 ident: 10.3233/SW-233514_ref68 article-title: The FAIR guiding principles for scientific data management and stewardship publication-title: Scientific data doi: 10.1038/sdata.2016.18 – volume: 7 start-page: 17823 year: 2019 ident: 10.3233/SW-233514_ref53 article-title: A taxonomy and survey of semantic approaches for query expansion publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2894679 – ident: 10.3233/SW-233514_ref19 doi: 10.3233/SSW200009 – ident: 10.3233/SW-233514_ref45 – ident: 10.3233/SW-233514_ref26 doi: 10.1609/aaai.v29i1.9491 – volume: 5 start-page: 135 year: 2017 ident: 10.3233/SW-233514_ref4 article-title: Enriching word vectors with subword information publication-title: Transactions of the association for computational linguistics doi: 10.1162/tacl_a_00051 – ident: 10.3233/SW-233514_ref56 doi: 10.1007/978-3-319-46523-4_30 – ident: 10.3233/SW-233514_ref30 doi: 10.1145/2063518.2063519 – ident: 10.3233/SW-233514_ref39 doi: 10.1145/2254129.2254168 – volume: 1 start-page: 317 issue: 3 year: 1997 ident: 10.3233/SW-233514_ref58 article-title: On comparing classifiers: Pitfalls to avoid and a recommended approach publication-title: Data Min. Knowl. Discov. doi: 10.1023/A:1009752403260 – ident: 10.3233/SW-233514_ref36 doi: 10.1007/978-3-642-30284-8_44 – volume: 9 start-page: 750 issue: 5 year: 2020 ident: 10.3233/SW-233514_ref10 article-title: A survey on knowledge graph embedding: Approaches, applications and benchmarks publication-title: Electronics doi: 10.3390/electronics9050750 – ident: 10.3233/SW-233514_ref33 – ident: 10.3233/SW-233514_ref21 doi: 10.1145/2396761.2396832 – ident: 10.3233/SW-233514_ref50 – ident: 10.3233/SW-233514_ref43 doi: 10.1145/2623330.2623732 – ident: 10.3233/SW-233514_ref65 – ident: 10.3233/SW-233514_ref67 doi: 10.1145/3459637.3482377 – volume: 3 start-page: 158 issue: 2–3 year: 2005 ident: 10.3233/SW-233514_ref18 article-title: LUBM: A benchmark for OWL knowledge base systems publication-title: Journal of Web Semantics doi: 10.1016/j.websem.2005.06.005 – ident: 10.3233/SW-233514_ref59 doi: 10.1609/aaai.v32i1.11535 – ident: 10.3233/SW-233514_ref27 doi: 10.3115/v1/n15-1142 – ident: 10.3233/SW-233514_ref47 doi: 10.1007/978-3-030-88361-4_17
SSID	ssj0000516385
Score	4.629233
Snippet	Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector...
SourceID	crossref
SourceType	Index Database
StartPage	845
Subtitle	An experimental evaluation of RDF2vec variants and their capabilities
Title	The RDF2vec family of knowledge graph embedding methods
Volume	15
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA-6veiD-InfBPFFRlTatHGPwznGUB_sZHsrS3rDMbaKTh_8670kXRZ14PSlhKNJmvzSu8sld0fIqQItuLlksQyBocTrMSm4Yqg7xzpHJYieuSB7HzcfeasbdWcp_Yx3yUSeq4-5fiX_QRVpiKv2kv0Dsq5RJGAZ8cUnIozPhTF-qDeCd1BTSwXqfs5MVjHRqCswkpAZ3xWbLvrVV0gTGOHcDlSlA9KxyVynaLY5olqz1aMvET6Bzb7chMEw9y0GAdeH3dZTc8rkhInnaklgaAFu_Riv2hQ3jjNG3goIPTZ3ZUNAOokZz2PGYaCNxY2kw7AQFR_wJeD1N0HkrgfixkRXTpNOaqsuk3IghD6GL9fqd7eJs6IhS0EGEpmguMWgbPgoXf_Cde0pHZ720F4na4XaT2sWww2yBONNsuoFg9wiAtGkBZrUoknzPnVoUoMmdWjSAs1tcta4aV832bTv9NlGEEl_DDDcIaVxPoZdQvu4R4yR9_OsmnGlZI8rEFGVZ5cqDPC_3yMnv7e3v8hLB2RltjwOSWny8gZHqGJN5HExz59XLygV
linkProvider	EBSCOhost
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+RDF2vec+family+of+knowledge+graph+embedding+methods&rft.jtitle=Semantic+Web&rft.au=Portisch%2C+Jan&rft.au=Paulheim%2C+Heiko&rft.date=2024-05-14&rft.issn=1570-0844&rft.eissn=2210-4968&rft.volume=15&rft.issue=3&rft.spage=845&rft.epage=876&rft_id=info:doi/10.3233%2FSW-233514&rft.externalDBID=n%2Fa&rft.externalDocID=10_3233_SW_233514
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1570-0844&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1570-0844&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1570-0844&client=summon