A Neural Architecture Search for Automated Multimodal Learning

The boom of artificial intelligence in the past decade is owed to the research and development of deep learning and moreover, that of accessible deep learning. But the goal of Artificial General Intelligence (AGI) cannot be achieved by having application-specific, parameter sensitive neural networks...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 207; p. 118051
Main Authors	Singh, Anuraj, Nair, Haritha
Format	Journal Article
Language	English
Published	Elsevier Ltd 30.11.2022
Subjects	Automation Deep learning Multimodal learning Neural architecture search Deep learning Automation Neural architecture search Multimodal learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The boom of artificial intelligence in the past decade is owed to the research and development of deep learning and moreover, that of accessible deep learning. But the goal of Artificial General Intelligence (AGI) cannot be achieved by having application-specific, parameter sensitive neural networks that need to be defined and tuned for every use case. General intelligence also involves understanding different types of data, rather than having dedicated models for each functionality. Thus both automating machine learning while also giving importance to generalizing over multiple modalities has great potential to help move AGI research forward. We propose a generalizable algorithm-Multimodal Neural Architecture Search (MNAS) which can work on multiple modalities and perform architecture search in order to create neural networks that enable classification on multiple types of data for multiclass outputs. The work automates the development of a fusion architecture by building upon existing literature of multimodal learning and neural architecture search. The controller network which predicts the architecture has been designed such that it works on a reward model where the reward is dependent on accuracies of individual networks corresponding to each modality involved. The work shows good results with accuracy comparable to both unimodal classification on same data and manually created multimodal architectures wherein the experiments are performed on multiclass classification problem of image and text modalities. It also uses a shared parameter search graph ensuring that the computational complexity is less compared to several other neural architecture search algorithms.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.118051