Learning Deep Hierarchical Visual Feature Coding
In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme compo...
Saved in:
Published in | IEEE transaction on neural networks and learning systems Vol. 25; no. 12; pp. 2212 - 2225 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.12.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model's inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 2162-237X 2162-2388 2162-2388 |
DOI: | 10.1109/TNNLS.2014.2307532 |