A multiscale neural architecture search framework for multimodal fusion

Multimodal fusion, a machine learning technique, significantly enhances decision-making by leveraging complementary information extracted from different data modalities. The success of multimodal fusion relies heavily on the design of the fusion scheme. However, this process traditionally depends on...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 679; p. 121005
Main Authors Lv, Jindi, Sun, Yanan, Ye, Qing, Feng, Wentao, Lv, Jiancheng
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multimodal fusion, a machine learning technique, significantly enhances decision-making by leveraging complementary information extracted from different data modalities. The success of multimodal fusion relies heavily on the design of the fusion scheme. However, this process traditionally depends on manual expertise and exhaustive trials. To tackle this challenge, researchers have undertaken studies on DARTS-based Neural Architecture Search (NAS) variants to automate the search of fusion schemes. In this paper, we present theoretical and empirical evidence that highlights the presence of catastrophic search bias in DARTS-based multimodal fusion methods. This bias traps the search into a deceptive optimal childnet, rendering the entire search process ineffective. To circumvent this phenomenon, we introduce a novel NAS framework for multimodal fusion, featuring a robust search strategy and a meticulously designed multi-scale fusion search space. Significantly, the proposed framework is capable of capturing modality-specific information across multiple scales while achieving an automatic balance between intra-modal and inter-modal information. We conduct extensive experiments on three commonly used multimodal classification tasks from different domains and compare the proposed framework against state-of-the-art approaches. The experimental results demonstrate the superior robustness and high efficiency of the proposed framework. •This paper presents the first theoretical and empirical evidence demonstrating that DARTS suffers from catastrophic search bias in multimodal fusion, rendering the entire search ineffective. We term this phenomenon as the “Matthew Effect” and give a profound analysis.•A novel NAS framework for multimodal fusion is proposed, wihch features a robust search strategy and a multi-scale fusion search space. The framework leverages the single-path one-shot NAS algorithm as the search strategy, fully circumventing the occurrence of the “Matthew Effect”.•The extensive experimental results effectively showcase the exceptional robustness and high efficiency of the proposed framework.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2024.121005