Cross-modal contrastive learning for multimodal sentiment recognition

Multimodal sentiment recognition has obtained increasing attention in recent years due to its potential to improve sentiment recognition accuracy by integrating information from multiple modalities. However, the heterogeneity issue caused by the differences in modalities poses a significant challeng...

Full description

Saved in:
Bibliographic Details
Published inApplied intelligence (Dordrecht, Netherlands) Vol. 54; no. 5; pp. 4260 - 4276
Main Authors Yang, Shanliang, Cui, Lichao, Wang, Lei, Wang, Tao
Format Journal Article
LanguageEnglish
Published New York Springer US 01.03.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multimodal sentiment recognition has obtained increasing attention in recent years due to its potential to improve sentiment recognition accuracy by integrating information from multiple modalities. However, the heterogeneity issue caused by the differences in modalities poses a significant challenge for multimodal sentiment recognition. In this paper, we propose a novel framework, Cross-Modal Contrastive Learning (CMCL), which integrates multiple contrastive learning methods and multimodal data augmentation to address the heterogeneity issue. Specifically, we establish a cross-modal contrastive learning framework by leveraging diversity contrastive learning, consistency contrastive learning and sample-level contrastive learning. Through diversity contrastive learning, we constrain modality features to different feature spaces, capturing the complementary nature of modality-specific features. Additionally, through consistency contrastive learning, we map the representations of different modalities into a shared feature space, capturing the consistency of modality-specific features. We also introduce two data augmentation techniques, namely random noise and modal combination, to improve the model’s robustness. The experimental results show that our approach achieves state-of-the-art performance on three benchmark datasets and outperforms the existing baseline models. Our work demonstrates the effectiveness of cross-modal contrastive learning and data augmentation in multimodal sentiment recognition, and provides valuable insights for future research in this area. Graphical abstract
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-024-05355-8