STP-MFM: Semi-tensor product-based multi-modal factorized multilinear pooling for information fusion in sentiment analysis

Multi-modal fusion can exploit complementary information from various modalities and improve the accuracy of prediction or classification tasks. In this paper, we propose a semi-tensor product-based multi-modal factorized multilinear (STP-MFM) pooling method for information fusion in sentiment analy...

Full description

Saved in:
Bibliographic Details
Published inDigital signal processing Vol. 145; p. 104265
Main Authors Liu, Fen, Chen, Jianfeng, Li, Kemeng, Bai, Jisheng, Tan, Weijie, Cai, Chang, Ayub, Muhammad Saad
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multi-modal fusion can exploit complementary information from various modalities and improve the accuracy of prediction or classification tasks. In this paper, we propose a semi-tensor product-based multi-modal factorized multilinear (STP-MFM) pooling method for information fusion in sentiment analysis. Initially, we extend the bilinear pooling to multilinear pooling for multi-modal fusion. Next, we propose a multi-modal factorized multilinear pooling (MFM) method, which parametrizes the fusion weight tensor with the Tucker decomposition. Furthermore, we propose to use Semi-Tensor Product (STP) in MFM to obtain more flexible and compact tensor decompositions with smaller factor sizes, this process permits the connection of two factors with different dimensionality by using the semi-tensor mode product. The proposed method removes the limitation of dimension consistency in matrix multiplication and expresses the information in a more compact structure with less memory. Most importantly, the STP leverages temporal and spatial information from video, audio, and text, producing a better representation of intra-modality correlations. We verified the proposed STP-MFM for sentiment analysis on the CMU-MOSI and the IEMOCAP datasets. The experimental results indicate that the proposed method outperforms the baselines by a significant margin. Moreover, it also gains a superior training speed and lowers model complexity. •We extend the idea of bilinear pooling to multilinear pooling, which can fuse the information from more than two modalities and enhance the representation capacity of fused features. Moreover, it avoids the parameter of exponential growth when more than two modalities are fused simultaneously.•We propose a multi-modal factorized multilinear pooling method (MFM) based on Tucker tensor decomposition, which can decrease the data redundancy and parameter complexity while can promote fast and rich interactions between modalities in modeling sparse.•We present a semi-tensor product-based multi-modal factorized multilinear pooling (STP-MFM) method. The STP-MFM eliminates the limitation of dimension consistency in matrix multiplication and expresses the information in a more compact structure with less memory.•The effectiveness of the proposed method is evaluated using experiments on two different multi-modal datasets.
ISSN:1051-2004
1095-4333
DOI:10.1016/j.dsp.2023.104265