Joint semantic and geometric segmentation of videos with a stage model

We address the problem of geometric and semantic consistent video segmentation for outdoor scenes. With no assumption on camera movement, we jointly model the semantic-geometric class of spatio-temporal regions (supervoxels) and geometric scene layout in each frame. Our main contribution is to propo...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Winter Conference on Applications of Computer Vision pp. 737 - 744
Main Authors	Buyu Liu, Xuming He, Gould, Stephen
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2014
Subjects	Computational modeling Feature extraction Joints Labeling Semantics Smoothing methods Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We address the problem of geometric and semantic consistent video segmentation for outdoor scenes. With no assumption on camera movement, we jointly model the semantic-geometric class of spatio-temporal regions (supervoxels) and geometric scene layout in each frame. Our main contribution is to propose a stage scene model to efficiently capture the dependency between the semantic and geometric labels. We build a unified CRF model on supervoxel labels and stage parameters, and design an alternating inference algorithm to minimize the resulting energy function. We also extend smoothing based on hierarchical image segmentation to spatio-temporal setting and show it achieves better performance than a pairwise random field model. Our method is evaluated on the CamVid dataset and achieves state-of-the-art per-pixel as well as per-class accuracy in predicting both semantic and geometric labels.
ISSN:	1550-5790 2642-9381
DOI:	10.1109/WACV.2014.6836029