A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new t...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 38; no. 6; pp. 1056 - 1069
Main Authors	Yin Zheng, Yu-Jin Zhang, Larochelle, Hugo
Format	Journal Article
Language	English
Published	United States IEEE 01.06.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Computational modeling Data models Deep neural network Joints Mathematical model Multimodal data modeling Neural autoregressive model Neural networks Topic model Training Visualization neural autoregressive model topic model Multimodal data modeling deep neural network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. First, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the learned hidden topic features and show how to employ it to learn a joint representation from image visual words, annotation words and class label information. We test our model on the LabelMe and UIUC-Sports data sets and show that it compares favorably to other topic models. Second, we propose a deep extension of our model and provide an efficient way of training the deep model. Experimental results show that our deep model outperforms its shallow version and reaches state-of-the-art performance on the Multimedia Information Retrieval (MIR) Flickr data set.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2015.2476802