Bayesian optimization on graph-structured search spaces: Optimizing deep multimodal fusion architectures

A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs like video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at differen...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 298; pp. 80 - 89
Main Authors Ramachandram, Dhanesh, Lisicki, Michal, Shields, Timothy J., Amer, Mohamed R., Taylor, Graham W.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 12.07.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs like video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyperparameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose two methods to compute structural similarities in the search space of tree-structured multimodal architectures, and demonstrate their effectiveness on two challenging multimodal human activity recognition problems.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2017.11.071