Learning Multimodal Representations by Symmetrically Transferring Local Structures
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary...
Saved in:
Published in | Symmetry (Basel) Vol. 12; no. 9; p. 1504 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.09.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering. |
---|---|
ISSN: | 2073-8994 2073-8994 |
DOI: | 10.3390/sym12091504 |