MM-Transformer: A Transformer-Based Knowledge Graph Link Prediction Model That Fuses Multimodal Features

Multimodal knowledge graph completion necessitates the integration of information from multiple modalities (such as images and text) into the structural representation of entities to improve link prediction. However, most existing studies have overlooked the interaction between different modalities...

Full description

Saved in:

Bibliographic Details
Published in	Symmetry (Basel) Vol. 16; no. 8; p. 961
Main Authors	Wang, Dongsheng, Tang, Kangjie, Zeng, Jun, Pan, Yue, Dai, Yun, Li, Huige, Han, Bin
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.08.2024
Subjects	Graph representations Graphical representations Heterogeneity James, LeBron knowledge graph Knowledge representation link prediction Methods multimodal features Neural networks Prediction models Symmetry Tags Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Multimodal knowledge graph completion necessitates the integration of information from multiple modalities (such as images and text) into the structural representation of entities to improve link prediction. However, most existing studies have overlooked the interaction between different modalities and the symmetry in the modal fusion process. To address this issue, this paper proposed a Transformer-based knowledge graph link prediction model (MM-Transformer) that fuses multimodal features. Different modal encoders are employed to extract structural, visual, and textual features, and symmetrical hybrid key-value calculations are performed on features from different modalities based on the Transformer architecture. The similarities of textual tags to structural tags and visual tags are calculated and aggregated, respectively, and multimodal entity representations are modeled and optimized to reduce the heterogeneity of the representations. The experimental results show that compared with the current multimodal SOTA method, MKGformer, MM-Transformer improves the Hits@1 and Hits@10 evaluation indicators by 1.17% and 1.39%, respectively, proving that the proposed method can effectively solve the problem of multimodal feature fusion in the knowledge graph link prediction task.
ISSN:	2073-8994 2073-8994
DOI:	10.3390/sym16080961