Image Annotation by Propagating Labels from Semantic Neighbourhoods

Automatic image annotation aims at predicting a set of semantic labels for an image. Because of large annotation vocabulary, there exist large variations in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of human annotation, several i...

Full description

Saved in:

Bibliographic Details
Published in	International journal of computer vision Vol. 121; no. 1; pp. 126 - 148
Main Authors	Verma, Yashaswi, Jawahar, C. V.
Format	Journal Article
Language	English
Published	New York Springer US 2017 Springer Springer Nature B.V
Subjects	Algorithms Analogies Analysis Annotations Artificial Intelligence Computer Imaging Computer Science Correlation analysis Data mining Datasets Image Processing and Computer Vision Image processing systems Image retrieval Labeling Labels Learning Neural networks Pattern Recognition Pattern Recognition and Graphics Semantics Studies Vision Nearest neighbour Cross-media analysis Image annotation Metric learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automatic image annotation aims at predicting a set of semantic labels for an image. Because of large annotation vocabulary, there exist large variations in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of human annotation, several images are not annotated with all the relevant labels (“incomplete-labelling”). These two issues affect the performance of most of the existing image annotation models. In this work, we propose 2-pass k-nearest neighbour (2PKNN) algorithm. It is a two-step variant of the classical k-nearest neighbour algorithm, that tries to address these issues in the image annotation task. The first step of 2PKNN uses “image-to-label” similarities, while the second step uses “image-to-image” similarities, thus combining the benefits of both. We also propose a metric learning framework over 2PKNN. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label data. In addition to the features provided by Guillaumin et al. ( 2009 ) that are used by almost all the recent image annotation methods, we benchmark using new features that include features extracted from a generic convolutional neural network model and those computed using modern encoding techniques. We also learn linear and kernelized cross-modal embeddings over different feature combinations to reduce semantic gap between visual features and textual labels. Extensive evaluations on four image annotation datasets (Corel-5K, ESP-Game, IAPR-TC12 and MIRFlickr-25K) demonstrate that our method achieves promising results, and establishes a new state-of-the-art on the prevailing image annotation datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0920-5691 1573-1405
DOI:	10.1007/s11263-016-0927-0