Coding Visual Features Extracted From Video Sequences

Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 23; no. 5; pp. 2262 - 2276
Main Authors	Baroffio, Luca, Cesana, Matteo, Redondi, Alessandro, Tagliasacchi, Marco, Tubaro, Stefano
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.05.2014 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Applied sciences Coding Detection, estimation, filtering, equalization, prediction Encoding Exact sciences and technology Feature extraction Image coding Image processing Information theory Information, signal and communications theory Networks Object recognition Pattern recognition Redundancy Searching Signal and communications theory Signal processing Signal, noise Software Telecommunications and information theory Tracking Vectors Video coding Video sequences Visual Visualization SURF local descriptors SIFT video coding Visual features Video coding Performance evaluation Content based retrieval Video signal Implementation Optimization Video signal processing Interframe encoding Image sequence Target detection Signal detection Target tracking Redundancy Control system Rate distortion theory Pattern recognition Object recognition Visual search Object detection Signal processing Feature extraction Metric Sensor array Central unit
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2014.2312617