Unsupervised Segmentation of Stereoscopic Video Objects: Constrained Segmentation Fusion Versus Greedy Active Contours

In this paper two efficient unsupervised video object segmentation approaches are proposed and thoroughly compared. Both methods are based on the exploitation of depth information, estimated from stereoscopic pairs. Depth is a more efficient semantic descriptor of visual content, since usually an ob...

Full description

Saved in:
Bibliographic Details
Published inJournal of signal processing systems Vol. 81; no. 2; pp. 153 - 181
Main Authors Ntalianis, Klimis S., Doulamis, Anastasios D., Doulamis, Nikolaos D., Mastorakis, Nikos E., Drigas, Athanasios S.
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper two efficient unsupervised video object segmentation approaches are proposed and thoroughly compared. Both methods are based on the exploitation of depth information, estimated from stereoscopic pairs. Depth is a more efficient semantic descriptor of visual content, since usually an object is located on one depth plane. However, depth information fails to accurately represent the contours of an object mainly due to erroneous disparity estimation and occlusion issues. For this reason, the first approach projects color segments onto depth information in order to address the limitations of both depth and color segmentation; color segmentation usually over-partitions an object into several regions, while depth fails to precisely represent object contours. Depth information is produced through an occlusion compensated disparity field and then a depth map is generated. On the contrary, color segmentation is accomplished by incorporating a modified version of the Multiresolution Recursive Shortest Spanning Tree segmentation algorithm (M-RSST). Next considering the first “Constrained Fusion of Color Segments” (CFCS) approach, a color segments map is created, by applying the M-RSST to one of the stereoscopic channels. In this case video objects are extracted by fusing color segments according to depth similarity criteria. The second method also utilizes the depth segments map. In particular an active contour is automatically initialized onto the boundary of each depth segment, which is usually different from a video object’s boundary. Initialization is accomplished by a fitness function that considers different color areas and preserves the shapes of depth segments’ boundaries. For acceleration purposes each point of the active contour is associated to an “attractive edge” point and a greedy approach is incorporated so that the active contour converges to its final position. Several experiments on real life stereoscopic sequences are performed and extensive comparisons in terms of speed and accuracy indicate the promising performance of both methods.
ISSN:1939-8018
1939-8115
DOI:10.1007/s11265-014-0921-0