MM-Transformer: A Transformer-Based Knowledge Graph Link Prediction Model That Fuses Multimodal Features
Multimodal knowledge graph completion necessitates the integration of information from multiple modalities (such as images and text) into the structural representation of entities to improve link prediction. However, most existing studies have overlooked the interaction between different modalities...
Saved in:
Published in | Symmetry (Basel) Vol. 16; no. 8; p. 961 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Multimodal knowledge graph completion necessitates the integration of information from multiple modalities (such as images and text) into the structural representation of entities to improve link prediction. However, most existing studies have overlooked the interaction between different modalities and the symmetry in the modal fusion process. To address this issue, this paper proposed a Transformer-based knowledge graph link prediction model (MM-Transformer) that fuses multimodal features. Different modal encoders are employed to extract structural, visual, and textual features, and symmetrical hybrid key-value calculations are performed on features from different modalities based on the Transformer architecture. The similarities of textual tags to structural tags and visual tags are calculated and aggregated, respectively, and multimodal entity representations are modeled and optimized to reduce the heterogeneity of the representations. The experimental results show that compared with the current multimodal SOTA method, MKGformer, MM-Transformer improves the Hits@1 and Hits@10 evaluation indicators by 1.17% and 1.39%, respectively, proving that the proposed method can effectively solve the problem of multimodal feature fusion in the knowledge graph link prediction task. |
---|---|
ISSN: | 2073-8994 2073-8994 |
DOI: | 10.3390/sym16080961 |