MF-BERT: Multimodal Fusion in Pre-Trained BERT for Sentiment Analysis

Multimodal sentiment analysis mainly concentrates on language, acoustic and visual information. Previous work based on BERT utilizes only text (language) representation to fine-tune BERT, while ignoring the importance of nonverbal information. Due to the fact that features extracted from a single mo...

Full description

Saved in:
Bibliographic Details
Published inIEEE signal processing letters Vol. 29; pp. 454 - 458
Main Authors He, Jiaxuan, Hu, Haifeng
Format Journal Article
LanguageEnglish
Published New York IEEE 2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multimodal sentiment analysis mainly concentrates on language, acoustic and visual information. Previous work based on BERT utilizes only text (language) representation to fine-tune BERT, while ignoring the importance of nonverbal information. Due to the fact that features extracted from a single modality may contain uncertainty, it is challenging for BERT to perform well in real-world applications. In this paper, we propose a multimodal fusion BERT that can explore the time-dependent interactions among different modalities. Additionally, prior BERT-based methods tend to train the models with only one optimizer to update the parameters. However, we argue that BERT has been pre-trained with a lot of corpora so it needs to be fine-tuned slightly. Therefore, an internal updating mechanism is introduced to avoid the overfitting of the model in the training process. We set two optimizers for multimodal fusion BERT and other components of the model with different learning rates, which enables the model to attain optimal parameters. The results of experiments on public datasets demonstrate that our model is superior to the baselines and achieves the state-of-the-art.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2021.3139856