A Neural Architecture Search for Automated Multimodal Learning

The boom of artificial intelligence in the past decade is owed to the research and development of deep learning and moreover, that of accessible deep learning. But the goal of Artificial General Intelligence (AGI) cannot be achieved by having application-specific, parameter sensitive neural networks...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 207; p. 118051
Main Authors Singh, Anuraj, Nair, Haritha
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 30.11.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The boom of artificial intelligence in the past decade is owed to the research and development of deep learning and moreover, that of accessible deep learning. But the goal of Artificial General Intelligence (AGI) cannot be achieved by having application-specific, parameter sensitive neural networks that need to be defined and tuned for every use case. General intelligence also involves understanding different types of data, rather than having dedicated models for each functionality. Thus both automating machine learning while also giving importance to generalizing over multiple modalities has great potential to help move AGI research forward. We propose a generalizable algorithm-Multimodal Neural Architecture Search (MNAS) which can work on multiple modalities and perform architecture search in order to create neural networks that enable classification on multiple types of data for multiclass outputs. The work automates the development of a fusion architecture by building upon existing literature of multimodal learning and neural architecture search. The controller network which predicts the architecture has been designed such that it works on a reward model where the reward is dependent on accuracies of individual networks corresponding to each modality involved. The work shows good results with accuracy comparable to both unimodal classification on same data and manually created multimodal architectures wherein the experiments are performed on multiclass classification problem of image and text modalities. It also uses a shared parameter search graph ensuring that the computational complexity is less compared to several other neural architecture search algorithms.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.118051