Part-Stacked CNN for Fine-Grained Visual Categorization

In the context of fine-grained visual categorization, the ability to interpret models as human-understandable visual manuals is sometimes as important as achieving high classification accuracy. In this paper, we propose a novel Part-Stacked CNN architecture that explicitly explains the finegrained r...

Full description

Saved in:
Bibliographic Details
Published in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1173 - 1182
Main Authors Shaoli Huang, Zhe Xu, Dacheng Tao, Ya Zhang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2016
Subjects
Online AccessGet full text
ISSN1063-6919
DOI10.1109/CVPR.2016.132

Cover

Loading…
More Information
Summary:In the context of fine-grained visual categorization, the ability to interpret models as human-understandable visual manuals is sometimes as important as achieving high classification accuracy. In this paper, we propose a novel Part-Stacked CNN architecture that explicitly explains the finegrained recognition process by modeling subtle differences from object parts. Based on manually-labeled strong part annotations, the proposed architecture consists of a fully convolutional network to locate multiple object parts and a two-stream classification network that encodes object-level and part-level cues simultaneously. By adopting a set of sharing strategies between the computation of multiple object parts, the proposed architecture is very efficient running at 20 frames/sec during inference. Experimental results on the CUB-200-2011 dataset reveal the effectiveness of the proposed architecture, from multiple perspectives of classification accuracy, model interpretability, and efficiency. Being able to provide interpretable recognition results in realtime, the proposed method is believed to be effective in practical applications.
ISSN:1063-6919
DOI:10.1109/CVPR.2016.132