Learning part-based mid-level representation for visual recognition

There exists a huge semantic gap between the low-level image representations and high-level semantics. To bridge such a gap, this paper proposes a mid-level image representation for visual recognition, where an image is represented based upon the response maps of local part filters. Each dimension o...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 275; pp. 2126 - 2136
Main Authors	Yuan, Baodi, Tu, Jian, Zhao, Rui-Wei, Zheng, Yingbin, Jiang, Yu-Gang
Format	Journal Article
Language	English
Published	Elsevier B.V 31.01.2018
Subjects	Event recognition Learning Mid-level representation Part filter Scene recognition Learning Part filter Event recognition Scene recognition Mid-level representation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	There exists a huge semantic gap between the low-level image representations and high-level semantics. To bridge such a gap, this paper proposes a mid-level image representation for visual recognition, where an image is represented based upon the response maps of local part filters. Each dimension of the mid-level representation indicates the likelihood of seeing a part in the input image. The part filters are trained using external data and need not to be fine-tuned on test data. To eliminate the possibly redundant similar parts occurring in different objects or scenes, we perform unsupervised clustering for part refinement. To alleviate the expensive computation of the response maps of the part filters, we further leverage sparse coding to accelerate the feature extraction process, which is ten times faster without significantly compromising the recognition accuracy. We evaluate the proposed mid-level representation on both image and video content recognition tasks and attain state-of-the-art results.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2017.10.062