Multimodal fusion using dynamic hybrid models

We propose a novel hybrid model that exploits the strength of discriminative classifiers along with the representational power of generative models. Our focus is on detecting multimodal events in time varying sequences. Discriminative classifiers have been shown to achieve higher performances than t...

Full description

Saved in:
Bibliographic Details
Published inIEEE Winter Conference on Applications of Computer Vision pp. 556 - 563
Main Authors Amer, Mohamed R., Siddiquie, Behjat, Khan, Saad, Divakaran, Ajay, Sawhney, Harpreet
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose a novel hybrid model that exploits the strength of discriminative classifiers along with the representational power of generative models. Our focus is on detecting multimodal events in time varying sequences. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based classifiers. On the other hand, generative models learn a rich informative space which allows for data generation and joint feature representation that discriminative models lack. We employ a deep temporal generative model for unsupervised learning of a shared representation across multiple modalities with time varying data. The temporal generative model takes into account short term temporal phenomena and allows for filling in missing data by generating data within or across modalities. The hybrid model involves augmenting the temporal generative model with a temporal discriminative model for event detection, and classification, which enables modeling long range temporal dynamics. We evaluate our approach on audio-visual datasets (AVEC, AVLetters, and CUAVE) and demonstrate its superiority compared to the state-of-the-art.
ISSN:1550-5790
2642-9381
DOI:10.1109/WACV.2014.6836053