Multilabel Image Classification With Regional Latent Semantic Dependencies

Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single sho...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 20; no. 10; pp. 2801 - 2813
Main Authors	Zhang, Junjie, Wu, Qi, Shen, Chunhua, Zhang, Jian, Lu, Jianfeng
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.10.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Annotations Classification Convolution deep neural network Image classification Labels Mathematical models Multilabel image classification Neural networks Predictive models Proposals Recurrent neural networks semantic dependence Semantics State of the art Upper bounds Visual discrimination Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2018.2812605